How to improve performance of an EF Core query which uses several Includes
Asked Answered
H

1

5

I've got this query, which I'll simplify for brevity:

public IQueryable<User> GetByIdAsync(Guid userId)
{
    return MyContext
        .Users
        //Bunch of Includes
            //Most of which have a ThenInclude
                //Followed by another ThenInclude
        .FirstOrDefaultAsync(u => u.Id == userId)
}

When run for around 100 users, it takes over 15 seconds (running locally on my machine). Not great.

I've tried using AsNoTracking(), as well as changing it to use a compiled query like so:

private static Func<MyContext, Guid, Task<User>> _getByIdAsync =
            EF.CompileAsyncQuery((MyContext context, Guid userId) =>
                context
                .Users
                //Same Includes as above
                .Where(u => u.Id == userId)
                .FirstOrDefault());

public IQueryable<User> GetByIdAsync(Guid userId)
{
    return await _getByIdAsync(userId);
}

Still no difference.

I've had a look at this answer for a relevant thread, which suggests using plain old SQL:

https://mcmap.net/q/2032682/-how-to-improve-ef-query-performance-with-20-includes

And I've had a look at this answer, which mentions clustered indexes:

https://mcmap.net/q/2032683/-improving-performance-of-big-ef-multi-level-include

I certainly can't exclude any of the Includes as the client depends on all this info. Redesigning is also not an option at this stage.

Questions

  1. Are there any other options that can improve performance?
  2. I can't see any CLUSTERED or NONCLUSTERED tags in any of my child table indexes. Is this worth looking into and if so, can I be pointed to any documentation that explains how I can go about updating using EF (or without)?
Handpick answered 25/8, 2019 at 5:28 Comment(6)
Is MyContext created more than once in a given request? A "cold" DbContext tends to take longer to return results than one that's already connected and has a cache.Impetus
@TiesonT. Thanks for the suggestion. The time that I've mentioned is for subsequent requests. The very first one takes even longer. I don't believe the context is created multiple times for a single request though.Handpick
Sounds about right. This isn't really an EF suggestion, but disabling Named Pipes (assuming it's enabled) in SQL Server (assuming that's what you're using) and explicitly connecting via tcp tends to help (the process of making a connection, not the query itself, is likely the slowest part of the EF query)Impetus
@TiesonT. Hmm, interesting. May look into that. Other queries for me that are not complex go through quite quickly, like less than 50ms, so I wonder if it's actually a connection delay.Handpick
If you haven't yet, run SQL Profiler while making the request - that will show you the SQL EF is generating and how the queries are being run. As @VolodymyrBilyachat notes, it's likely that you're actually running multiple distinct queries, rather than one large query, especially if any of the navigation properties on your User model are collections.Impetus
@TiesonT. Thanks, I'll try that out. Yes, you're right, quite a few of my navigation properties inside User are collections. Viewing the console output, it does appear as though there are multiple SELECT statements being executed.Handpick
M
8

You have many ways but it all depends.

  1. You have .FirstOrDefaultAsync(u => u.Id == userId) which means that for 100 users you will go to database 100 times so in total 15 000 / 100 == equals 150 ms per request. To improve it try to get all 100 user at once using in clause like .Where(u=> userIds.contains(u.Id))

Example.

private static Func<MyContext, Guid, Task<List<User>>> _getByIdAsync =
            EF.CompileAsyncQuery((MyContext context, List<Guid> userIds) =>
                context
                .Users
                //Same Includes as above
                .Where(u => userIds.Contains(u.Id))).ToListAsync();
  1. I know nothing about your data structure but if you can write linq using joins it could be faster, because for many to many within one request EF can go to database each time per dependency.

Example how you can query using joins

var query = (from users in context.Users
join otherTable in context.OtherTable  on users.Id equals otherTable.UserId).ToList();
  1. Ef try to fit general purpose but sometimes as you know your data only you can do it better, I used to have similar problem to yours when I have repository methods to get data 1 by one, but then I wrote new method to fetch data using array and that method took care about joined data, via EF it was basically impossible to do it fast. So what I am saying in one request load all one to one, then read from db and using another query go and grab many to many you need.
  2. Also you can get sql query

You can get sql using this sample

public IQueryable<User> GetByIdAsync(Guid userId)
{
    var = query = MyContext
        .Users
        //Bunch of Includes
            //Most of which have a ThenInclude
                //Followed by another ThenInclude
     var sql = query.ToSql(); // <--------------------------- sql query
     return query.FirstOrDefaultAsync(u => u.Id == userId)
}

and use sql query to profile and see if its using indexes.

And lastly I really hate methods like this public IQueryable GetByIdAsync(Guid userId) problem is that most of the time you dont need all that includes, but you start using them more and more and become depend on them... This is why I would recommend use EF without repository pattern, EF itself is repository get from database only data you need.

Meaganmeager answered 25/8, 2019 at 6:13 Comment(7)
Thank you so much for those suggestions. Regarding point 1, can I ask where the 15,000 comes from? If you could maybe elaborate a bit, showing how you'd replace the FirstOrDefault with a 'where in', I'd really appreciate it. For point 2, some of my relations are indeed many-to-many. For point 3, I get what you mean. Really does feel like raw SQL here would give me much better results, considering it's less than 10 tables I'm joining and it's all Foreign-Key to Primary-Key joins.Handpick
15 sec * 1000 milisecond per second :)Meaganmeager
Oh right, hah. Totally forgot I had mentioned 15 seconds in my question.Handpick
@Handpick Its not only raw sql, Usually biggest issue is many to many, this is where i see many people complain on EF, but its not ef :) you need to sit and think what it the best way to fetch data.Meaganmeager
By 'sit and think what is the best way to fetch data', do you mean thinking about appropriate/efficient join statements to retrieve the data or do you mean redesigning the table structure to minimise many-to-may?Handpick
@Handpick depends, but if I would be you, I would write method to get all users in one go, see if its ok with you even with all includes, if its too bad, try to remove all includes to many to many and see if its fine, if it is try to include one by one if you have one bad include or there is many. And after that you will see whats next.Meaganmeager
Thanks. I'll try that out and get back to you.Handpick

© 2022 - 2024 — McMap. All rights reserved.