Lazy vs eager loading performance on Entity Framework
Asked Answered
W

5

10

So I have the following model classes on my DbContext:

Loans

Everytime I render a list of LoanApplication objects I do something like this:

var context = new MyContext();
var applications = context.LoanApplications.Where(d => d.PropertyThatIWantToFilter = localVariable);

This returns an IQueryable that then I convert to a ViewModel like this on my controller method call:

var vm = applications.Select(d => new LoanApplicationViewModel(d));

Inside the LoanApplicationViewModel constructor I accept the entity object and do the corresponding mapping. The thing is that, since the Solicitors collection is a navigational property, a call is made to the database each time a new view model is instanced. The average number of solicitors per application is two, so that means that if I render a table listing the last 10 applications then the app is making about ~18-20 trips to the database.

I thought there had to be a better way to get this collection, so I changed my original query to eagerly load the collection like so:

var applications = context.LoanApplications.Include("Solicitors").Where...

Although this reduced the number of calls to the database to only one, the query was much slower, about 50% more slow.

The database is hosted on SQL Azure, and we've implemented Transient Fault Handling, but I want to reduce the quantity of calls made to the database without reducing response-time performance.

What is the best practice here?

Wynne answered 3/4, 2013 at 3:29 Comment(4)
in pure efficiency, eager loading is superior. If the initial hit pushes a threshold, you should reconsider. If your lazy load would not need more than 20% of the extra data, stagger it. But in shear measure of data puller vs gross time hit, eager loading always wins.Mana
You cannot have it both ways - it's about making the best trade off - I'd say either eager-load/Include (I'd agree that's the best if needed) - if you need to have them right from the start. Or don't load at all - no eager load either. Then later on do Reload for the one entity you need to have up-to-date (i.e. you don't have the initial impact) - or you can make, automate/integrate that into something smarter. It's not ideal, nothing is - but it's what you get.Ilsa
As my answer below, profile. Maybe the solicitors have already been loaded into MyContext before. (You surely are not creating new context for every query right?) Or maybe your view model is using .Select to omit db columns that has many MB of data?Goffer
Also, I recommend not passing IQueryable around the application. You have no way to know what Query has already been applied to it when you pass it around and it makes testing difficult. (Everything now is tied to QueryProvider in the IQueryable, and the only way to verify the query is translated correctly is to run it against SQL. Then people just start stubbing it with List<T> and make a complete mess, and the tests then proves nothing as a Query against List<T> doesn't necessarily mean it translate correctly into SQL and runs correctly on SQL.)Goffer
G
14

"What is the best practice here?"

The best practice is to

  1. set !application wide! performance target
  2. profile, benchmark, and locate bottle neck
  3. review and fine tune the bottle neck that gives you the greatest performance win for least work. (and from my experience 90% of the time it's not tsql)

Now that may seem a bit irrelevant, but from that point of view, which ever loading pattern you PROFILED to be optimal within your application domain is the correct way to go.

There's no "best practice" of eager/lazy. That's why both options are both available. Also if the tsql is your bottle neck and switching between eager/lazy still isn't hitting your performance target, you will need to go down a whole plethora of other tools such as query analyzer and query plan analyser in SSMS.


For some background:

I was googling "eager loading slow" and came here. Here's my result:

var foo = _context.Foos
    //.Include("Answers")
    //.Include("Attachments")
    .FirstOrDefault(q => q.Id == key);

Eager loading: 106ms

Lazy loading: 11ms + 5ms + 5ms

Lazy loading wins, end of story.

Goffer answered 11/11, 2013 at 0:22 Comment(6)
+1 for picking a profile for you app domain. There is never a generic "best" answer where performance tradeoffs are concerned.Salesmanship
Are you sure that 106ms isn't simply Entity framework's setup for the firsttime use of the DbContext in an app?Weihs
@Weihs profiling is done in the actual application, in this case a web application. This isn't one of those "run in a console application and compare". The complexity introduced by the lifetime management policy of IOC containers and the usage pattern of the eager/lazy loading across different layers in an application is the whole reason to "profile".Goffer
I think the profile case is wrong. The results seem realistic, but the returned sets are different. To elaborate, eager load here returns one foo object (according to its key), all of its Answers and all of its Attachments. Lazy load (with commented out lines) returns only the Foo object. Everything else needs to be loaded lazily later (witch seems to be missing in the code). I think you should add two foreach statements for lazy loading which would use (get) the answers and attachments and measure that into lazy load. That should produce the same data sets retrieved and the correct times.Hohenlinden
+1 - measure it. I swapped a 3-generation include select statement with lazy loading and the processing time went from 2:45 to 1:56. Using EF 6.1.3 with SQlite with 229 parents, 18k children & 50k grandchildren.Airport
@AndrejMohar and that is the whole point. Whether lazy loading is faster or not is determined by the way your application access those lazy objects. So the fact that two benchmarks return different result is irrelevant to the application timing.Goffer
A
4

In addition to SQL statements that gives a huge results or lots of calls when using both eager and lazy there is huge job that takes place by putting and mapping into the ObjectContext/DbContext from the result. This causes a huge performance hit and I can't really recommend any of these when retrieving large amount of data.

The best solution is to specify an explicit Select call. However, it's a bit difficult to give you an example on how to do this without knowing how your viewmodel object is built up. So, what I do here is giving you an example that uses anonymous object's as result from the query.

This example gives you contacts with information about the customer the contact belongs to.

var contacts = context.Contacts.Where(row => row.CategoryId == 1)
                      .Select(row => new {
                                             ContactId = row.Id,
                                             Name = row.Name,
                                             CustomerName = row.Customer.Name
                                         }).ToList();

This query will generate an SQL SELECT that joins Contacts with Customer using an inner join, and then only select the Contact.Id, Contact.Name and Customer.Name columns.

This solution is far most the most effective way to retrieve data from server if you don't intend to work with the data and save the changes right back to the same context. It doesn't use either eager nor lazy loading.

Antigen answered 4/4, 2013 at 18:59 Comment(1)
I really like this approach, but unfortunately we have typed objects and a repository pattern implementation with reusable Select commands on a RepositoryBase class that I do not wish to stop using. I know there's a tradeoff between performance and commodity when writing maintenable code. I guess I should rearrange my question then: is doing the lazy loading approach ok in this case even though it's calling the database server 42 times?Wynne
F
0

If you could somehow query your solicitors table and filter the query using your already fetched list of applications then the fetched Entities would be cached in your context, which I believe will then be used for the navigation property instead of hitting the database.

I'm not sure exactly how to write the solicitors fetching query, but I was thinking something like this

int[] applicationIDs = applications.Select(x => x.ID).ToArray();
var solicitors = context.Solicitors.Where(x => x.Applications.Any(y => applicationIDs.Contains(y.ID))).ToArray(); // added toarray to cause execution cause im never sure when the LINQ actually runs
Ferocious answered 3/4, 2013 at 3:44 Comment(5)
I get the performance penalty even if I'm not filtering. context.LoanApplications is always faster than context.LoanApplications.Include("Solicitors"), even if that means goint to the database 20 times to get each child solicitor when the navigational property is calledWynne
maybe I was not clear in what I meant. the DbContext caches the entities, so if you have the Solicitors in the cache already, when you call the navigation property on applications it will get them from the cache instead of the database. The code I suggested was to optimize caching the solicitors so you are only fetching the ones you'll actually need.Tripoli
Gave it a try, gained some speed on the query, but the lazy load option is still faster. Do you have any idea what the best practice in this scenario is?Wynne
If this is a query you use all the time, and the actual data (columns) you use from both entities is small, you could consider using SqlQuery<CustomClassForReturnedColumns>("Your raw sql query here"). That would be the optimal solution performance wise.Tripoli
No, I actually have a repository pattern in place and this same query is used by many other entities (I simplified the query for the question). Doing a raw sql call would defeat the purpose of using the Entity Framework and the patterns for reusable code that I have in place.Wynne
R
0

Have you considered to use sql view?

I don't quite sure about Sql Azure. However in sql server, you can have performance penalty when joining 2 tables without having proper indexes. Maybe this happen in your query.

To be noted, your before query is accessing 1 table with where clause, 2 calls. And in the after query, it is accessing 2 tables with where clause, 1 call. There is join in your after query and is likely to need different index.

You can create a sql view to make sure that a proper index is used. Then make your application call the view. Stored procedure can be used for this purpose too but it is less suitable for this.

Reeves answered 3/4, 2013 at 4:39 Comment(2)
Indexes are in place. The tables are generated using Code-First Migrations.Wynne
Are you sure that the index is used during processing? And have you tried to do the same query (join) in sql directly?Reeves
C
0

Eager loading fetches redundant master data. It will take lots of memory, though object graph in context stores only single master data per entity, but SQL will dump lots of data in it's plate. I took following image from here

enter image description here

If you see, Data of User table also repeated as many as UserDetails table in result set of SQL query. That seem to differentiating factor in performance (In your case master columns has more records then detail table).

If performance is your major concern, I would recommend You to use LINQ join with same where clause while fetching data for detail table separately So in your case :-

step1

 var context = new MyContext();
    var applications = context.LoanApplications.Where(d => d.PropertyThatIWantToFilter = localVariable);

and then step2

var solicitors = from s in context.Solicitors
join loanApp in context.LoanApplications
select s.columns
where loanApp. <<Same condition as in step 1 where clause>>

Thanks, your question made me to review my own code :-)

Cirillo answered 8/8, 2013 at 10:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.