Implementing the Repository Pattern Correctly with EF Core

Asked 22/11, 2020 at 17:6 Answered 20/11, 2022 at 14:44

c#entity-framework-core domain-driven-design repository-pattern ddd-repositories

NOTE

I'm not asking if I should use the Repository pattern, I care about the How. Injecting persistence-related objects into domain classes is not an option for me: it makes Unit Testing impossible (and no, tests using in-memory databases are NOT Unit Tests, as they cover many different classes without isolation), it couples the domain logic with the ORM and it brakes many important principles I practice, like Persistence Ignorance, Separation of Concerns, and others, whose benefits you're welcome to search online. Using EF Core "correctly" is not nearly as important to me as keeping the business logic isolated from external concerns, which is why I'll settle for a "hacky" usage of EF Core if it means the Repository won't be a leaky abstraction anymore.

Original Question

Let's assume the repository's interface is the following:

public interface IRepository<TEntity>
    where TEntity : Entity
{
    void Add(TEntity entity);
    void Remove(TEntity entity);
    Task<TEntity?> FindByIdAsync(Guid id);
}

public abstract class Entity
{
    public Entity(Guid id)
    {
        Id = id;
    }
    public Guid Id { get; }
}

Most of the EF Core implementations I saw online did something like:

public class EFCoreRepository<TEntity> : IRepository<TEntity>
    where TEntity : Entity
{
    private readonly DbSet<TEntity> entities;

    public EFCoreRepository(DbContext dbContext)
    {
        entities = dbContext.Set<TEntity>();
    }

    public void Add(TEntity entity)
    {
        entities.Add(entity);
    }

    public void Remove(TEntity entity)
    {
        entities.Remove(entity);
    }

    public async Task<TEntity?> FindByIdAsync(Guid id)
    {
        return await entities.FirstOrDefaultAsync(e => e.Id == id);
    }
}

The changes are committed in another class, in an implementation of the Unit of Work pattern. The problem I have with this implementation is that it violates the definition of a repository as a "collection-like" object. Users of this class would have to know that the data is persisted in an external store and call the Save() method themselves. The following snippet won't work:

var entity = new ConcreteEntity(id: Guid.NewGuid());
repository.Add(entity);
var result = await repository.FindByIdAsync(entity.Id); // Will return null

The changes should obviously not be committed after every call to Add(), because it defeats the purpose of the Unit of Work, so we end up with a weird, not very collection-like interface for the repository. In my mind, we should be able to treat a repository exactly like we would treat a regular in-memory collection:

var list = new List<ConcreteEntity>();
var entity = new ConcreteEntity(id: Guid.NewGuid());
list.Add(entity);
// No need to save here
var result = list.FirstOrDefault(e => e.Id == entity.Id);

When the transaction scope ends, the changes can be committed to the DB, but apart from the low-level code that deals with the transaction, I don't want the domain logic to care about when the transaction is committed. What we can do to implement the interface in this fashion is to use the DbSet's Local collection in addition to the regular DB query. That would be:

...
public async Task<TEntity?> FindByIdAsync(Guid id)
{
    var entity = entities.Local.FirstOrDefault(e => e.Id == id);
    return entity ?? await entities.FirstOrDefaultAsync(e => e.Id == id);
}

This works, but this generic implementation would then be derived in concrete repositories with many other methods that query data. All of these queries will have to be implemented with the Local collection in mind, and I haven't found a clean way to enforce concrete repositories not to ignore local changes. So my question really boils down to:

Is my interpretation of the Repository pattern correct? Why is there no mention of this problem in other implementations online? Even Microsoft's implementation (which is a bit outdated, but the idea is the same) in the official documentation website ignores local changes when querying.
Is there a better solution to include local changes in EF Core than manually querying both the DB and the Local collection every time?

UPDATE - My Solution

I ended up implementing the second solution suggested by @Ronald's answer. I made the repository save the changes to the database automatically, and wrapped every request in a database transaction. One thing I changed from the proposed solution is that I called SaveChangesAsync on every read, not write. This is similar to what Hibernate already does (in Java). Here is a simplified implementation:

public abstract class EFCoreRepository<TEntity> : IRepository<TEntity>
    where TEntity : Entity
{
    private readonly DbSet<TEntity> dbSet;
    public EFCoreRepository(DbContext dbContext)
    {
        dbSet = dbContext.Set<TEntity>();
        Entities = new EntitySet<TEntity>(dbContext);
    }

    protected IQueryable<TEntity> Entities { get; }

    public void Add(TEntity entity)
    {
        dbSet.Add(entity);
    }

    public async Task<TEntity?> FindByIdAsync(Guid id)
    {
        return await Entities.SingleOrDefaultAsync(e => e.Id == id);
    }

    public void Remove(TEntity entity)
    {
        dbSet.Remove(entity);
    }
}

internal class EntitySet<TEntity> : IQueryable<TEntity>
    where TEntity : Entity
{
    private readonly DbSet<TEntity> dbSet;
    public EntitySet(DbContext dbContext)
    {
        dbSet = dbContext.Set<TEntity>();
        Provider = new AutoFlushingQueryProvider<TEntity>(dbContext);
    }

    public Type ElementType => dbSet.AsQueryable().ElementType;

    public Expression Expression => dbSet.AsQueryable().Expression;

    public IQueryProvider Provider { get; }

    // GetEnumerator() omitted...
}

internal class AutoFlushingQueryProvider<TEntity> : IAsyncQueryProvider
    where TEntity : Entity
{
    private readonly DbContext dbContext;
    private readonly IAsyncQueryProvider internalProvider;

    public AutoFlushingQueryProvider(DbContext dbContext)
    {
        this.dbContext = dbContext;
        var dbSet = dbContext.Set<TEntity>().AsQueryable();
        internalProvider = (IAsyncQueryProvider)dbSet.Provider;
    }
    public TResult ExecuteAsync<TResult>(Expression expression, CancellationToken cancellationToken = default)
    {
        var internalResultType = typeof(TResult).GenericTypeArguments.First();

        // Calls this.ExecuteAsyncCore<internalResultType>(expression, cancellationToken)
        object? result = GetType()
            .GetMethod(nameof(ExecuteAsyncCore), BindingFlags.NonPublic | BindingFlags.Instance)
            ?.MakeGenericMethod(internalResultType)
            ?.Invoke(this, new object[] { expression, cancellationToken });

        if (result is not TResult)
            throw new Exception(); // This should never happen

        return (TResult)result;
    }

    private async Task<TResult> ExecuteAsyncCore<TResult>(Expression expression, CancellationToken cancellationToken)
    {
        await dbContext.SaveChangesAsync(cancellationToken);
        return await internalProvider.ExecuteAsync<Task<TResult>>(expression, cancellationToken);
    }

    // Other interface methods omitted...
}

Notice the use of IAsyncQueryProvider, which forced me to use a small Reflection hack. This was required to support the asynchronous LINQ methods that comes with EF Core.

Geminian answered 22/11, 2020 at 17:6 Comment(9)

EF itself is an implementation of the Repository and Unit of Work patterns. Layering additional abstractions on top of that usually increases complexity, reduces maintainability, reduces reusability, and reduces runtime efficiency. – Cicada 22/11, 2020 at 19:5

What is the source of truth for your application - is it the Database or the application memory? If its the database, your Repository behaves as expected and only returns saved objects. I think those considerations are more important than trying to implement a "collection-like" repository. – Preclinical 23/11, 2020 at 19:22

The database is an implementation detail, the domain model should not know that it exists. If anything is the "source of truth", it's the in-memory aggregate, but even if we do care about the details, I'm talking about local changes made within the same transaction, so the database will always stay consistent. – Geminian 23/11, 2020 at 19:37

I see your point @GurGaller. I'd argue that the database is implementation detail, but persistence is not, but I may be wrong here. I think the question is interesting one and will watch for a decent answer :) – Preclinical 23/11, 2020 at 19:46

@AlexBuyny Well, if we go by the book, then the Persistence Ignorance Principal, does define the concept of persistence to be an external concern unknown to the domain model. While we obviously don't want to follow every principal blindly, I do like the fact that the domain model can operate in a pure in-memory OO environment, that focus on behivior rather on data. Even the concept of a transaction is unknown to the domain model, it is handled by an Application Layer that wraps the domain model. – Geminian 23/11, 2020 at 20:2

Yeah let's see if someone has better ideas. I agree, it seems like "repository as collection" is leaking the fact that it persists stuff. – Preclinical 23/11, 2020 at 20:13

As a side note - querying Local is not reliable anyway. In this simple case it works, but in case of more complex queries it can fail, because in-memory and database provider translation of the same query expression might be different. Plus you would need to merge results (and order by can be present)... As first comment mentions - such repository over another repository adds nothing useful. – When 25/11, 2020 at 19:40

This is a discussion-type question that doesn't belong on Stack Overflow. Not saying it's not interesting or unimportant, just that it doesn't fit here. A bounty is not a way to make a question on-topic. – Toaster 30/11, 2020 at 15:17

For me, your interception is correct if you're talking about EF repos only. Your db can be text/csv file, witch has got different definitions for repo (generally, repo is a repo, no matter for text, flat or relational db). EF has change tracking, while Foo has not. Some have transaction capability some don't. Even with specification pattern, the behavior/definition of repos should differ from EF to Mongo or etc. I aim to say it's the abstraction that defines the level of expectations from a repo. Therefore, we can't say Correct Repository Pattern is exactly this or that – Formation 11/12, 2020 at 0:4

You can look into this repository implementation approach from the Microsoft powered EShopOnWeb project:

According to the rules of Domain-driven design a repository is dedicated to handle a collection of aggregates. The interface in this sample solution looks like the following:

public interface IAsyncRepository<T> where T : BaseEntity, IAggregateRoot
{
    Task<T> GetByIdAsync(int id, CancellationToken cancellationToken = default);
    Task<IReadOnlyList<T>> ListAllAsync(CancellationToken cancellationToken = default);
    Task<IReadOnlyList<T>> ListAsync(ISpecification<T> spec, CancellationToken cancellationToken = default);
    Task<T> AddAsync(T entity, CancellationToken cancellationToken = default);
    Task UpdateAsync(T entity, CancellationToken cancellationToken = default);
    Task DeleteAsync(T entity, CancellationToken cancellationToken = default);
    Task<int> CountAsync(ISpecification<T> spec, CancellationToken cancellationToken = default);
    Task<T> FirstAsync(ISpecification<T> spec, CancellationToken cancellationToken = default);
    Task<T> FirstOrDefaultAsync(ISpecification<T> spec, CancellationToken cancellationToken = default);
}

The interface itself resides in the domain layer (here in this project called application core).

The concrete implementation repository implementations (here for EFCore) reside in the infrastructure layer.

There is a generic EFCore repository implementation for covering common repository methods:

public class EfRepository<T> : IAsyncRepository<T> where T : BaseEntity, IAggregateRoot
{
    protected readonly CatalogContext _dbContext;

    public EfRepository(CatalogContext dbContext)
    {
        _dbContext = dbContext;
    }

    public virtual async Task<T> GetByIdAsync(int id, CancellationToken cancellationToken = default)
    {
        var keyValues = new object[] { id };
        return await _dbContext.Set<T>().FindAsync(keyValues, cancellationToken);
    }

    public async Task<T> AddAsync(T entity, CancellationToken cancellationToken = default)
    {
        await _dbContext.Set<T>().AddAsync(entity);
        await _dbContext.SaveChangesAsync(cancellationToken);

        return entity;
    }

    public async Task UpdateAsync(T entity, CancellationToken cancellationToken = default)
    {
        _dbContext.Entry(entity).State = EntityState.Modified;
        await _dbContext.SaveChangesAsync(cancellationToken);
    }

    public async Task DeleteAsync(T entity, CancellationToken cancellationToken = default)
    {
        _dbContext.Set<T>().Remove(entity);
        await _dbContext.SaveChangesAsync(cancellationToken);
    }
}

I just referenced some of the methods here.

And for more specific repository methods that fit the requirements you can implement more specific repository interfaces in the domain layer which are again implemented in the infrastructure layer derived by the generic IAsyncRepository and that specific interface. See here for an example (although the method provided is not the best example I think you can get the idea).

With this approach actual saving to the database is completely handled by the repository implementation and not part of the repository interface.

Transactions on the other should not be in neither the domain layer or the repository implementation. So if you need several aggregate updates to be consistent within the same use case this transaction handling should be handled in the application layer.

This also fits with the rule of Eric Evans from his Book Domain-Driven Design.

Leave transaction control to the client. Although the REPOSITORY will insert into and delete from the database, it will ordinarily not commit anything. It is tempting to commit after saving, for example, but the client presumably has the context to correctly initiate and commit units of work. Transaction management will be simpler if the REPOSITORY keeps its hands off.

See Chapter Six, Repositories.

Airfoil answered 2/12, 2020 at 17:17 Comment(2)

Thank you for the answer. One thing I noticed here is that the implementation calls SaveChangesAsync after every write operation. Isn't it a bit inefficient? Is there a way to save all the changes once, and still implement the interface correctly (include unsaved aggregates in the result)? – Geminian 2/12, 2020 at 17:45

I know what you mean, but if you stick to the aggregate pattern the SaveChangesAsync() will cover a whole aggregate including child entities and value objects and with that minimize database round-trips already. DDD is also more suited for focused changes within aggregate boundaries which should not imply performance concerns with writes using this repository approach. Also, I'm afraid you cannot have both, not calling save internally when performing and Add() method in EFCore and retrieving the added aggregate immediately with a subsequent find() call. – Majunga 2/12, 2020 at 19:19

Merging the result sets of the same query run against different datasets doesn't work in general.

It's pretty straight forward if you only have local inserts and only use where and select in your queries because then the merge operation is just append.
It gets increasingly more difficult as you try to support more operators like order by, skip & take, group by and also local updates and deletions.

In particular there's no other way to support group by with local updates and deletions but to merge both data sources first and then applying the group by.

Doing this in your app is going to be unfeasible because it would mean retrieving the whole table, applying local changes and then doing the group by.

Something that might work is to transfer your local changes to the database instead and running the query there.

There are two ways that i can think of to achieve this.

Transforming queries

Transform your queries to include local changes by replacing their from clause

so a query like

select sum(salary) from employees group by division_id

would become

select
    sum(salary) 
from 
(
    select 
        id, name, salary, division_id 
    from employees
    -- remove deleted and updated records
    where id not in (1, 2)
    -- add inserted records and new versions of updated records
    union all values (1, 'John', 200000, 1), (99, 'Jane', 300000, 1)
) _
group by division_id

This should also work for joins if you apply the same transformation to the joined tables.
It would require some pretty involved customization to do this with ef though.

This is an idea on how to implement it at least partially with ef, it won't support joins and unfortunately involves some manual sql generation.

static IQueryable<T> WithLocal<T>(this DbContext db)
    where T : Entity
{
    var set = db.Set<T>();
    var changes = db.ChangeTracker.Entries<T>();
    var model = db.Model.FindEntityType(typeof(T));

    var deletions = changes
        .Where(change => change.State == EntityState.Deleted)
        .Select(change => change.Entity.Id);
        
    return set
        // Hard part left as an exercise for the reader :)
        // Generate this from 'changes' and 'model', you can use parameters for the values
        .FromSqlRaw("select 1 as id, 'John' as name, 200000 as salary, 1 as division_id union all select 99 as id, 'Jane' as name, 300000 as salary, 1 as division_id")
        .Union(set.Where(entity => !deletions.Contains(entity.Id)));
}

you can then use this like so

var query = db.WithLocal<Employee>()
    .GroupBy(employee => employee.DivisionId)
    .Select(group => group.Sum(employee => employee.Salary));

Keeping a transaction open

A simpler way is to just do the writes to the database but without committing the transaction, this way all the queries that you run on the same transaction will see the changes but no one else will, at the end of the request you can then commit or rollback from outside of your repositories.

With this approach your queries will also see database generated values like computed columns, auto increment ids and trigger generated values.

I have never tried this and can't speak for the performance implications of these approaches but if you need this feature I think there aren't many other ways..

Mckenziemckeon answered 30/11, 2020 at 12:13 Comment(3)

Thank you for the answer. Regarding the first solution, I tried to use Union to send the in-memory data with the query (like you demonstrated), but apparently EF doesn't support in-memory collections in Union statements. So if anyone has an idea of how to do it, please share it. As for the second solution, it'll work, but it requires another trip to the DB for every read method, which is not ideal (we have to call SaveChanges before querying). – Geminian 30/11, 2020 at 16:10

Also, SaveChanges() might throw an exception, if there is a violation of some UNIQUE constraint in one of the added entities, for example, and it's not the job of the caller to a method like FindById() to handle such exceptions. – Geminian 30/11, 2020 at 16:36

@GurGaller I updated the answer with an idea on how to do it with ef. To avoid exceptions in SaveChanges you could save after every write (maybe provide AddRange to be more efficient) or use deferrable constraints if your db has them. – Mckenziemckeon 30/11, 2020 at 19:56

It seems there is a misconception about Repositories and Entities here. First of all, DDD's Entity and EntityFramework's Entity are sligthly different concepts. In DDD, an Entity is basically a way of keeping track of the evolution of an business concept instance overtime, whereas in EntityFramwork, an Entity is merely a persitence concern.

The repository pattern, in a DDD point of view, won't manipulate Entities directly, but rather Aggregates. Yeah, cool story bro, but what does it change? Long story short, an aggregate can be seen as a transactionnal boundary that protects strict Domain Invariants, invariants that must complies with trancationnal consistency, opposed to eventual consistency. A repository, in a DDD perspective, will fecth an instance of an Aggregate, that is an object rooted by DDD's Entity called Aggregate Root, with optionnal Entities and Value Objects within it.
With EF, a Repository will do the heavy lifting, fetching datas from one or more SQL Tables, relying on a Factory to provide a fully instanciated and ready-to-use Aggregate. It will also do the transactionnal work in order to save the Aggregate (and its internals components) in a structured, relationnal Fashion in the DB. But Aggregates shouldn't know about repository. The core model doesn't mind about any persistence details. Aggregate usage belongs to the "Application Layer" or the "Use Case" layer, not the Domain layer.

Let's wrap it up. Let's say you want to implement DDD repository in an asp.net thin app :

class OrderController
{
    private IOrderRepository _orderRepository;

    public OrderController(IOrderRepository orderRepository)
    {
        _orderRepository = orderRepository;
    }

    public async Task PlaceOrder(Guid orderId)
    {
        var aggregate = await _orderRepository.FindByIdAsync(orderId);
        aggregate.PlaceOrder();
        await _orderRepository.Save();
    }
}

internal interface IOrderRepository
{
    void Add(Order order);
    void Remove(Order order);
    Task<Order> FindByIdAsync(Guid id);
    Task Save();
}

internal class Order
{
    public Guid Id { get; }

    private IList<Item> items;
    public static Order CreateOrder(IList<Item> items)
    {
        return new Order(items);
    }

    private Order(IList<Item> items)
    {
        this.Id = Guid.NewGuid();
        this.items = items;
    }

    public void PlaceOrder()
    {
        // do stuff with aggregate sttus and items list
    }
}

What happens here? The controller is the "Use Case" layer : it's responsible for fecthing the aggregate (the Aggregate Root from the repo, make the Aggregate do its job then command the repo to save its changes. It could be more transparent with an unit of work in the controller, that would save the injected DbContext (because the concrete repo will have to access different DbSet: Order and Items)
But you get the idea. You may also want to keep 1 Data Access per table, but it will be used by the Aggregate-dedicated Repository.

Hope it was clear enough

Xanthophyll answered 1/12, 2020 at 10:44 Comment(5)

Thank you for the answer. You're right that repositories should work only on aggregate roots, the example I gave in the question is simplified because it doesn't really matter what we are saving. My problem with your solution is the Save method in the Repository. The Repository's interface is part of the domain model, and if we include a Save method in it, we are breaking the Persistence Ignorance Principal – Geminian 1/12, 2020 at 21:36

In this simplified example, yes, the Save is on the Repository. For the sake of simplicity. But you can easily build a Unit of work using the same DBContext used by your repository, and call its SaveChanges() method in the "use case", where you want to commit your transaction. You even can avoid the Unit Of Work pattern by injecting both the DbContext and the Repository in the UseCase (here, the controller) so the use case make an explicit call to the SaveChanges() method. – Xanthophyll 1/12, 2020 at 23:35

The key part, here : SaveChanges() is called once, on the DbContext, instead of once for every DbSets. And Repository does more work than just a mapping between your entities and your tables. Keep in mind that your Domain model serves a different purpose than your Persitence model. – Xanthophyll 1/12, 2020 at 23:36

I understand all that, but it still doesn't solve my problem. If I take the Save method outside the Repository interface, but using the Repository without saving is impossible, the Repository is a leaky abstraction, and it's not collection-like at all. Domain services that use a Repository should not call Save, because they shouldn't care about persistence or transaction scope, so if they work only with a Repository, unsaved data will not be included in queries. See the example in the question – Geminian 2/12, 2020 at 6:40

Domain Services should not know about any persistance details, I agree. However, here, it's more an Application Service, that take care of the execution of a use-case within an application, more than an Domain Service which responsability is to host some Domain Logic that doesn't belong to a specific Aggregate. Implementation details of persistence layer should not leak in the Domain Model, I agree. But, AFAIK, nothing forbids the Application Layer, or Use Case layer (the place where you fecth an aggregate and call a method on it) to know a bit about persistence details. – Xanthophyll 2/12, 2020 at 8:13

I see DbContext as the repository it has all the methods you need. Although some application architectures might exists without a need for entity framework and having there own repository patterns, unit of work (EF uses a changetracker) and query specification language (EF uses expressions). Those frameworks architectures seem to use EF to get a straight forward implementation anyway so why invest time in such an architecture?

The only thing that might be useful is query reuse (which I think is quite overrated) but EF has precompiled queries that might be helpful in that area.

Antiperiodic answered 20/11, 2022 at 14:44 Comment(2)

Reuse is not a very strong argument as you can use EF and still reuse queries in various ways (extension methods, for starters). The main problem I had with EF when I asked this question is that I didn't want to be coupled to it. I wanted to be able to switch to a MongoDB-based implementation tomorrow morning without changing the code that represents pure business logic – Geminian 25/11, 2022 at 21:57

Maybe at that time it wasn’t available but it would just require a different database provider for EF and you should have all the capabilities that I wouldn’t want to write in my own repository layer. Btw changing the database for any large enterprise database (being database independent) is usually just to expensive as most it will have e.g. reports or some 4gl stored procedures that is just to expensive to replace. – Antiperiodic 26/11, 2022 at 22:17

-2

You need to use SaveChanges() in order to be able to get new id.

UnitOfWork.cs

private readonly DbContext dbContext;
public UnitOfWork(DbContext dbContext)
{
    this.dbContext = dbContext;
}

public void Commit()
{
    dbContext.SaveChanges();
}

var entity = new ConcreteEntity(id: Guid.NewGuid());
repository.Add(entity);
Commit();
var result = await repository.FindByIdAsync(entity.Id);

EDITED

Unit Of Work.cs

var users = userRepository.GetAll(); // select
var roles = roleRepository.GetAll(); // select 
var entity = new ConcreteEntity(id: Guid.NewGuid());
repository.Add(entity);

var order = new Order()
{
    InvoiceNo = "00002",
    CustomerID = 1,
    Amount = 500.00, 
    OrderDetails = new OrderDetail()
                   {
                        ItemID = 1,
                        Quantity = 5,
                        Amount = 500.00
                   }
};

orderRepository.Add(order);

// can add more insert or update or delete here before commit

Commit();

var result = await repository.FindByIdAsync(entity.Id);
var orderresult = await orderRepository.FindByIdAsync(order.Id);

Dovecote answered 23/11, 2020 at 6:31 Comment(3)

I'm aware of the behivior of EF, but as I stated in the question, saving after every change defeats the purpose of the Unit of Work. The Repository must include unsaved entities in queries in order to implement the pattern correctly (and be "collection-like"), which is why I used the Local collection. I am looking for a better implementation that won't require every query to be written twice. – Geminian 23/11, 2020 at 11:14

Don't need to put save in repository. You just simply put all Select + Insert + Update + Delete and everything (all insert/update/delete you need to use) in Unit Of Work before commit, then after commit, you can use Select only. For example in my EDITED above. This is what you want. Commit once only. – Dovecote 24/11, 2020 at 0:56

It still doesn't implement the pattern correctly. The user of a Repository is not responsible for managing transaction scope, someone else has to commit the unit of work once per request. Again, I'm not looking to "make it work", I know how EF Core works, but I'm trying to implement the Repository pattern correctly, without the need to save changes for them to apply (within the same transaction). – Geminian 24/11, 2020 at 8:26

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++