Entity Framework is slow because of derived tables
Asked Answered
R

1

9

I am using MySQL Connector/Net 6.5.4 with LINQ to entities, and I frequently get terrible query performance because the entity framework generates queries that use derived tables.

Here is a simplified example of what I've encountered several times. In C#, I write a query like this:

var culverCustomers = from cs in db.CustomerSummaries where cs.Street == "Culver" select cs;
// later...
var sortedCustomers = culverCustomers.OrderBy(cs => cs.Name).ToList();

Instead of generating simple a query like this:

SELECT cust.id FROM customer_summary cust WHERE cust.street = "Culver" ORDER BY cust.name

The entity framework generates a query with a derived table like this:

SELECT Project1.id FROM (
    SELECT cust.id, cust.name, cust.street FROM customer_summary cust
    WHERE Project1.street = "Culver"
) AS Project1 -- here is where the EF generates a pointless derived table
ORDER BY Project1.name

If I explain both queries I get this for the first query:

id, select_type, table, type, possible_keys, rows
1,  PRIMARY,     addr,  ALL,  PRIMARY,       9
1,  PRIMARY,     cust,  ref,  PRIMARY,       4

... and something awful like this for the entity framework query

id, select_type, table,      type, possible_keys, rows
1,  PRIMARY,     <derived2>, ALL,                 9639
2,  DERIVED,     addr,       ALL,  PRIMARY,       9
2,  DERIVED,     cust,       ref,  PRIMARY,       4

Note the first row, where MySQL explains that it's scanning 9000+ records. Because of the derived table, MySQL is creating a temp table and loading every row. (Or so I'm deducing based on articles like this one: Derived Tables and Views Performance)

How can I prevent the Entity Framework from using a derived table, or how can I convince MySQL to do the obvious optimization for queries like this?

For completion, here is the view that is the source for this linq query:

create view  customer_summary as
select cust.id, cust.name, addr.street
customers cust 
join addresses addr
on addr.customer_id = cust.id
Rea answered 2/11, 2012 at 18:29 Comment(8)
FYI, this should see improvement with MySQL 5.6, but it will still not be nearly as good as in some other database servers.Pleven
Thanks! I will try it out and see if it improves. What DB do you recommend? I have some flexibility.Rea
I can only say this is something MS SQL Server gets right (EF-on-MSSQL uses the same sort of generated SQL, but it works well on that), and that'll for probably obvious reasons be the best supported database server if you're using Entity Framework. There can very well be other reasons why it would be a poor choice for you, though, so don't take this as a hard recommendation.Pleven
Update: I tested this on MySQL 5.6 rc, but the query is equally slow and has the same row count in its explain plan.Rea
It looks like not projecting would perform better in this case. Apparently you have a "select cust.Id" somewhere, causing EF to create a projection containing all data required for sorting, filtering and output. You could try, although not projecting is against all performance recommendations. As for which db: there are a lot of questions about really serious performance issues with MySql + EF. For Sql Server + EF mostly just "regular" optimization issues.Voracity
Is it possible to prevent a projection in the EF? In my experience it generates a projection for even the simplest queries.Rea
I migrated everything to use SQL Server, and it is remarkably 10-20x faster. I thought MySql+EF would be sufficient for my modest requirements (largest tables are 100s of thousands of rows), but connector/.net has fallen short repeatedly in terms of performance and missing features.Rea
SQL Server performs query optimization when it receives the query. This optimization is smart enough to know it doesn't need the derived table. Therefore, SQL Server creates a new query plan that works better. LINQ to SQL (and thus LINQ to Entities) assumes the database will do the bulk of the query optimization like SQL Server. SQL Server, Oracle, and other premium databases have robust query optimization: MySQL, Access, and other low cost databases do not.Pothole
S
1

I think your query statement is missing 'select'. You have not identified the record(s) you want. your query:

var culverCustomers = from cs in db.CustomerSummaries 
                        where cs.Street == "Culver";

//no select

what are you selecting from the table? try this

example:

var culverCustomers =  from cs in db.CustomerSummaries 
                        where cs.Street == "Culver"
                        select cs;
Starflower answered 8/7, 2014 at 17:52 Comment(2)
Does this eliminate the derived tables that the OP is speaking of?Kellerman
The missing select was a copy/paste error in the LINQ. It won't even compile without the select clause. I added the select clause to the original question to prevent confusion.Rea

© 2022 - 2024 — McMap. All rights reserved.