What is the best practice for multiple "Include"-s in Entity Framework?
Asked Answered
G

4

18

Let's say we have four entities in data model: Categories, Books, Authors and BookPages. Also assume Categories-Books, Books-Authors and Books-BookPages relationships are one-to-many.

If a category entity instance is retrieved from database - including "Books", "Books.BookPages" and "Books.Authors" - this will become a serious performance issue. Moreover, not including them will result in "Object reference is not set to an instance of an object" exception.

What is the best practice for using multiple Include method calls?

  • Write a single method GetCategoryById and include all items inside (performance issue)
  • Write a single method GetCategoryById and send a list of relationships to include (maybe, but still seems not elegant enough)
  • Write methods like GetCategoryByIdWithBooks, GetCategoryByIdWithBooksAndBooksPages and GetCategoryByIdWithBooksAndAuthors (not practical)

EDIT: By second option I meant something like this:

public static Category GetCategoryById(ModelEntities db, int categoryId, params string[] includeFields)
{
    var categories = db.Categories;

    foreach (string includeField in includeFields)
    {
        categories = categories.Include(includeField);
    }

    return categories.SingleOrDefault(i => i.CategoryId == categoryId);
}

When calling we need a code like this:

Category theCategory1 = CategoryHelper.GetCategoryById(db, 5, "Books");
Category theCategory2 = CategoryHelper.GetCategoryById(db, 5, "Books", "Books.Pages");
Category theCategory3 = CategoryHelper.GetCategoryById(db, 5, "Books", "Books.Authors");
Category theCategory4 = CategoryHelper.GetCategoryById(db, 5, "Books", "Books.Pages", "Books.Authors");

Are there any distinct disadvantages of this approach?

Galvanoscope answered 8/7, 2013 at 14:55 Comment(6)
could you elaborate more why you get this kind of exception? does lazy-load not work for you?Snakemouth
As you already know, one gets this exception when (s)he doesn't use the Include method. Lazy-loading works of course but then what is the purpose of using Include? Entity Framework might have included all relationships by default and thanks to lazy-loading, related entites would have been loaded when called. In this case there wouldn't be a meaning of using Include. I might be missing something, I am curious what that is.Galvanoscope
You should not be getting a null reference. Have you used the virtual keyword? You should also set navigation collections to an empty List in the constructor. The purpose of Include is to tell the entity framework that you want to eager load the entity instead of lazy load. When lazy loading is enabled EF does not include all references by defaultEarthshine
@Earthshine Where should I use the virtual keyword? You wrote "When lazy loading is enabled...". Can we enable/disable this feature and if yes how?Galvanoscope
You could start here: msdn.microsoft.com/en-us/data/jj574232.aspxEarthshine
@Colin, sorry for a late answer. I was reviewing my questions and felt need to write this. Thank you for noting that Include is for eager loading rather than lazy loading. I don't know how that happened but at the moment I am not having this issue. Maybe EF update solved it, I am not sure. Anyway, thanks for clearing things up. I will now go over other answers.Galvanoscope
V
11

Write a single method GetCategoryById and send a list of relationships to include (maybe, but still seems not elegant enough)

Write methods like GetCategoryByIdWithBooks, GetCategoryByIdWithBooksAndBooksPages and GetCategoryByIdWithBooksAndAuthors (not practical)

A combination of these two is currently my approach. I know what properties I want to include for each context, so I rather hand-code them (as you said yourself, lazy-loading isn't always an option, and if it is, you'll repeat the same repetitive Include()-like syntax when mapping from data models to DTO's).

This separation causes you to think harder about what datasets you want to expose, given data-access-code like this is usually hidden beneath a service.

By utilizing a base class containing a virtual method you can override to run the required Include()s:

using System.Data.Entity;

public class DataAccessBase<T>
{
    // For example redirect this to a DbContext.Set<T>().
    public IQueryable<T> DataSet { get; private set; }

    public IQueryable<T> Include(Func<IQueryable<T>, IQueryable<T>> include = null)
    {
        if (include == null)
        {
            // If omitted, apply the default Include() method 
            // (will call overridden Include() when it exists) 
            include = Include;
        }
        
        return include(DataSet);
    }
    
    public virtual IQueryable<T> Include(IQueryable<T> entities)
    {
        // provide optional entities.Include(f => f.Foo) that must be included for all entities
        return entities;
    }
}

You can then instantiate and use this class as-is, or extend it:

using System.Data.Entity;

public class BookAccess : DataAccessBase<Book>
{
    // Overridden to specify Include()s to be run for each book
    public override IQueryable<Book> Include(IQueryable<Book> entities)
    {
        return base.Include(entities)
                   .Include(e => e.Author);
    }
    
    // A separate Include()-method
    private IQueryable<Book> IncludePages(IQueryable<Book> entities)
    {
        return entities.Include(e => e.Pages);
    }
    
    // Access this method from the outside to retrieve all pages from each book
    public IEnumerable<Book> GetBooksWithPages()
    {
        var books = Include(IncludePages);
    }
}

Now you can instantiate a BookAccess and call methods on it:

var bookAccess = new BookAccess();

var allBooksWithoutNavigationProperties = bookAccess.DataSet;
var allBooksWithAuthors = bookAccess.Include();
var allBooksWithAuthorsAndPages = bookAccess.GetBooksWithPages();

In your case, you might want to create separate IncludePages and GetBooksWithPages-alike method pairs for each view of your collection. Or just write it as one method, the IncludePages method exists for reusability.

You can chain these methods all the way you like, since each of them (as well as Entity Framework's Include() extension method) returns yet another IQueryable<T>.

Vitrics answered 8/7, 2013 at 16:15 Comment(7)
Clever. I'm still ambivalent about performance of EF in these cases.Gullett
Entity Framework (EF) classes are derived from EntityObject and as far as I know, you cannot add a second derived class because it will be overwritten in the next "Update Model from Database..." action. You can extend EF classes since they are partial classes, but in partial classes you can specify base classes only in one place; and in this case that place is Designer.cs which will be overwritten. Please correct me if I am wrong.Galvanoscope
@ChristopherStevenson EF's Include() does "nothing" until the enumerator is called, then only writing the proper joins. You can always use SQL Profiler to inspect the queries being executed. - Anar: I'm talking about writing a separate data access class which you put in between your context and your code requiring access to the database.Vitrics
@Vitrics Using Entity Framework 5 with Oracle's MySql ADO.Net connector performs badly (for me) on these types of queries. My work around was to create database views for each type of usage of the database that needed to join multiple tables.Gullett
@ChristopherStevenson I've heard about more issues with MySQL and EF. The "IQueryable to SQL Query" functionality in the MySQL connector seems to be a bit subpar, but I haven't used it nor seen its code, so I can't comment on that.Vitrics
@Vitrics Don't get me wrong, your answer is valid. However, I am wondering if there is a simpler and more intuitive solution.Galvanoscope
@Anar I'm curious too, let's see what the bounty brings. :)Vitrics
F
4

As @Colin mentioned in the comments, you need to use the virtual keyword when defining navigation properties in order for them to work with lazy loading. Assuming you're using Code-First, your Book class should look something like this:

public class Book
{
  public int BookID { get; set; }
  //Whatever other information about the Book...
  public virtual Category Category { get; set; }
  public virtual List<Author> Authors { get; set; }
  public virtual List<BookPage> BookPages { get; set; }
}

If the virtual keyword is not used, then the proxy class created by EF will not be able to lazy load the related entity/entities.

Of course, if you are creating a new Book, it isn't going to be able to do lazy loading and will just throw the NullReferenceException if you try to iterate over the BookPages. That's why you should do one of two things:

  1. define a Book() constructor that includes BookPages = new List<BookPage>(); (same for Authors) or
  2. make sure that the ONLY time you ever have "new Book()" in your code is when you are creating a new entry that you are immediately saving to the database and then discarding without trying to get anything from it.

I personally prefer the 2nd option, but I know that many others prefer the 1st.

<EDIT> I found a third option, which is to use the Create method of the DbSet<> class. This means you would call myContext.Books.Create() instead of new Book(). See this Q+A for more info: Ramifications of DbSet.Create versus new Entity() </EDIT>

Now, the other way that lazy loading can break is when it is turned off. (I'm assuming that ModelEntities is the name of your DbContext class.) To turn it off, you would set ModelEntities.Configuration.LazyLoadingEnabled = false; Pretty self explanatory, no?

Bottom line is that you shouldn't need to use Include() everywhere. It's really meant to be more of a means of optimization rather than a requirement for your code to function. Using Include() excessively results in very poor performance because you end up getting far more than you really need from the database, because Include() will always bring in all related records. Let's say that you are loading a Category and there are 1000 Books belonging to that Category. You can't filter it to only include fetch the Books written by John Smith when using the Include() function. You can however (with lazy loading enabled) just do the following:

Category cat = ModelEntities.Categorys.Find(1);
var books = cat.Books.Where(b => b.Authors.Any(a => a.Name == "John Smith"));

This will actually result in fewer records being returned from the database and is much simpler to understand.

Hope that helps! ;)

Flighty answered 12/7, 2013 at 22:30 Comment(6)
Thank you for your comprehensive answer but I use Database-First approach which I tried to specify in my tags.Galvanoscope
Performance issue is depend on how you code the query. Lazy loading are not excluded from performance issue, e.g. if you load all List<Category> then loop foreach to access the Category.Books then you are seriously trapped into N+1 performance issue. If a user have 5 children does it make sense to explicitly (eager) load all of them to show children details compared to lazily 5+1 db request? So it depends, but most of the time I use Include because I load what I wanted with careBainmarie
@Bainmarie - I completely agree. The foreach looping of Categories and Books you mentioned is a textbook example of what I meant when I said that Include() is "really meant to be more of a means of optimization rather than a requirement for your code to function." Include() would certainly be appropriate in such a case. However, if you aren't going to loop through each Book record, then Include() wouldn't be desirable. The point, as you said, is to load the things that you need with care. Thanks for your comment. ;)Flighty
@Anar - I'm not as familiar with DB-First as I am with Code-First, but if it is creating the code using a DbContext, then the part about enabling/disabling lazy loading should still apply. There might be some additional properties on elements in the designer that would also control whether particular navigation properties are lazy loaded or not. I'll see if I can find something more to help in your situation when I get a chance. ;)Flighty
@Anar - When you open the edmx model in the designer, right-click on an empty area and hit Properties. Then there should be an option for "Lazy Loading Enabled". If that's true, then lazy loading should work. Also, if you're using the "EF5 DbContext Generator", which creates T4 templates (*.tt) which then generate your code from the model, you can just open up the .cs files that are generated and ensure that the navigation properties are virtual. (You might have to do a rebuild for them to get regenerated if you had to change the property in the edmx model.) Hope that helps. :)Flighty
Your answer is useful, +1.Galvanoscope
G
1

Some of the performance considerations are ADO.Net connector specific. I would keep in mind a database view or stored procedure as a backup if you're not getting the performance needed.

First, note that DbContext (and ObjectContext) objects are not thread-safe.

If you are concerned about clairity over performance, then the first option is the simplest.

On the other hand, if you're worried about performance--and are willing to dispose of the context object after getting the data--then you can query the data with multiple simultaneous tasks (threads) each using their own context object.

If you need a context to track changes to the data, you have the straight forward way of a single query to add all the items to the context, or you can use the Attach method to 'rebuild' the original state, and then change and save.

The latter goes something like:

using(var dbContext = new DbContext())
{
    var categoryToChange = new Categories()
    {
        // set properties to original data
    };
    dbContext.Categories.Attach(categoryToChange);
    // set changed properties
    dbContext.SaveChanges();
}

Unfortunately there's no one best practice to meet all situations.

Gullett answered 8/7, 2013 at 15:55 Comment(0)
B
0

In db first approach, say you create BookStore.edmx and add the Category and Book entity and it generate context like public partial class BookStoreContext : DbContext then it is a simple good practice if you can add partial class like this:

public partial class BookStoreContext
{
    public IQueryable<Category> GetCategoriesWithBooks()
    {
        return Categories.Include(c => c.Books);
    }

    public IQueryable<Category> GetCategoriesWith(params string[] includeFields)
    {
        var categories = Categories.AsQueryable();
        foreach (string includeField in includeFields)
        {
            categories = categories.Include(includeField);
        }
        return categories;
    }

    // Just another example
    public IQueryable<Category> GetBooksWithAllDetails()
    {
        return Books
            .Include(c => c.Books.Authors)
            .Include(c => c.Books.Pages);
    }

    // yet another complex example
    public IQueryable<Category> GetNewBooks(/*...*/)
    {
        // probably you can pass sort by, tags filter etc in the parameter.
    }
}

Then you can use it like this:

var category1 = db.CategoriesWithBooks()
                      .Where(c => c.Id = 5).SingleOrDefault();
var category2 = db.CategoriesWith("Books.Pages", "Books.Authors")
                      .Where(c => c.Id = 5).SingleOrDefault(); // custom include

Note:

  • You can read some of simple (so many complicated one out there) Repository pattern just to extend IDbSet<Category> Categories to group common Include and Where instead of using static CategoryHelper. So you can have IQueryable<Category> db.Categories.WithBooks()
  • You should not include all child entities in GetCategoryById because it does not self explain in the method name and that will cause performance issue if user of this method is not brother about Books entites.
  • Even though you not include all, if you use lazy loading, you still can have potential N+1 performance issue
  • If you have 1000 of Books better you page your load something like this db.Books.Where(b => b.CategoryId = categoryId).Skip(skip).Take(take).ToList() or even better you add the method above to be like this db.GetBooksByCategoryId(categoryId, skip, take)

I myself prefer explicitly loading entities since I will 'aware' what is currently loaded but lazy loading is only useful if you have conditional loading children entities and should be used within a small scope of db context otherwise I can't control the db hit and how big the result.

Bainmarie answered 18/7, 2013 at 7:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.