How to cache DataContext instances in a consumer type application?
Asked Answered
L

2

8

We have an application using SDK provided by our provider to integrate easily with them. This SDK connects to AMQP endpoint and simply distributes, caches and transforms messages to our consumers. Previously this integration was over HTTP with XML as a data source and old integration had two ways of caching DataContext - per web request and per managed thread id. (1)

Now, however, we do not integrate over HTTP but rather AMQP which is transparent to us since the SDK is doing all the connection logic and we are only left with defining our consumers so there is no option to cache DataContext "per web request" so only per managed thread id is left. I implemented chain of responsibility pattern, so when an update comes to us it is put in one pipeline of handlers which uses DataContext to update the database according to the new updates. This is how the invocation method of pipeline looks like:

public Task Invoke(TInput entity)
{
    object currentInputArgument = entity;

    for (var i = 0; i < _pipeline.Count; ++i)
    {
        var action = _pipeline[i];
        if (action.Method.ReturnType.IsSubclassOf(typeof(Task)))
        {
            if (action.Method.ReturnType.IsConstructedGenericType)
            {
                dynamic tmp = action.DynamicInvoke(currentInputArgument);
                currentInputArgument = tmp.GetAwaiter().GetResult();
            }
            else
            {
                (action.DynamicInvoke(currentInputArgument) as Task).GetAwaiter().GetResult();
            }
        }
        else
        {
            currentInputArgument = action.DynamicInvoke(currentInputArgument);
        }
    }

    return Task.CompletedTask;
}

The problem is (at least what I think it is) that this chain of responsibility is chain of methods returning/starting new tasks so when an update for entity A comes it is handled by managed thread id = 1 let's say and then only sometime after again same entity A arrives only to be handled by managed thread id = 2 for example. This leads to:

System.InvalidOperationException: 'An entity object cannot be referenced by multiple instances of IEntityChangeTracker.'

because DataContext from managed thread id = 1 already tracks entity A. (at least that's what I think it is)

My question is how can I cache DataContext in my case? Did you guys have the same problem? I read this and this answers and from what I understood using one static DataContext is not an option also.(2)

  1. Disclaimer: I should have said that we inherited the application and I cannot answer why it was implemented like that.
  2. Disclaimer 2: I have little to no experience with EF.

Comunity asked questions:

  1. What version of EF we are using? 5.0
  2. Why do entities live longer than the context? - They don't but maybe you are asking why entities need to live longer than the context. I use repositories that use cached DataContext to get entities from the database to store them in an in-memory collection which I use as a cache.

This is how entities are "extracted", where DatabaseDataContext is the cached DataContext I am talking about (BLOB with whole database sets inside)

protected IQueryable<T> Get<TProperty>(params Expression<Func<T, TProperty>>[] includes)
{
    var query = DatabaseDataContext.Set<T>().AsQueryable();

    if (includes != null && includes.Length > 0)
    {
        foreach (var item in includes)
        {
            query = query.Include(item);
        }
    }

    return query;
}

Then, whenever my consumer application receives AMQP message my chain of responsibility pattern begins checking if this message and its data I already processed. So I have method that looks like that:

public async Task<TEntity> Handle<TEntity>(TEntity sportEvent)
            where TEntity : ISportEvent
{
    ... some unimportant business logic

    //save the sport
    if (sport.SportID > 0) // <-- this here basically checks if so called 
                           // sport is found in cache or not
                           // if its found then we update the entity in the db
                           // and update the cache after that
    {
        _sportRepository.Update(sport); /* 
                                         * because message update for the same sport can come
                                         * and since DataContext is cached by threadId like I said
                                         * and Update can be executed from different threads
                                         * this is where aforementioned exception is thrown
                                        */

    }
    else                   // if not simply insert the entity in the db and the caches
    {
        _sportRepository.Insert(sport);
    }

    _sportRepository.SaveDbChanges();

    ... updating caches logic
}

I thought that getting entities from the database with AsNoTracking() method or detaching entities every time I "update" or "insert" entity will solve this, but it did not.

Lathrope answered 23/12, 2019 at 11:2 Comment(14)
Not that I have an answer just yet, can you tell me which version of EF you are using pleaseCouncil
also, take a look at this and see if this helps you at all #41347135Council
@SimonPrice, 5.0Lathrope
You can untrack entity A after updating it. But this will not handle your concurrency problem, just minimizes the occurrenceFixture
@ilkerkaran, but if I untrack after update/insert doesn't that mean that I will not be able to save it to db later? I am basically calling update or insert based on criteria and then immediately followed by SaveChanges.Lathrope
SaveChanges saves the entity to db when you call it so you can untrack the entity just after the SaveChanges call.Fixture
Why do entities live longer than contexts? The short piece of code doesn't give any clue regarding entity and context lifecycles. It's not even sure where contexts come in play.Poppo
@GertArnold, because I use them as a cache that later I check against and update once new update messages come?Lathrope
Yeah, but don't you agree that it's very hard to tell anything useful about this seeing only the code presented here? For example, I don't know if you can disable proxy creation and even less if that would help.Poppo
@GertArnold I will add code later today and possibly it will help you and others see the full pictureLathrope
@GertArnold I updated my question, please tell me if I missed something that I good to haveLathrope
github.com/mehdime/DbContextScope check this repository and blog postChamberlain
How is _sportRepository assigned and what is the code for your insert, update and savedbchanges? Some code for reading and writing to your cache would help as well.Tourcoing
Scoped lifetime (1 per request), but pay attention to what is your real problem, db context is not thread safe, used wrong your caching can crash your app learn.microsoft.com/en-us/dotnet/api/…Gosse
E
2

Whilst there is a certain overhead to newing up a DbContext, and using DI to share a single instance of a DbContext within a web request can save some of this overhead, simple CRUD operations can just new up a new DbContext for each action.

Looking at the code you have posted so far, I would probably have a private instance of the DbContext newed up in the Repository constructor, and then new up a Repository for each method.

Then your method would look something like this:

public async Task<TEntity> Handle<TEntity>(TEntity sportEvent)
        where TEntity : ISportEvent
{
        var sportsRepository = new SportsRepository()

        ... some unimportant business logic

        //save the sport
        if (sport.SportID > 0) 
        {
            _sportRepository.Update(sport);
        }
        else
        {
            _sportRepository.Insert(sport);
        }

        _sportRepository.SaveDbChanges();

}

public class SportsRepository
{
    private DbContext _dbContext;

    public SportsRepository()
    {
        _dbContext = new DbContext();
    }

}

You might also want to consider the use of Stub Entities as a way around sharing a DbContext with other repository classes.

Expellant answered 9/1, 2020 at 10:29 Comment(1)
Yes, unfortunately, datalayer project is used by both our new service and old website application and it is not a subject of change. :/ I ended up using one singleton dbcontext for all my flow and I suppose I will think of later changing this when we make our pipeline multithreadedLathrope
T
0

Since this is about some existing business application I will focus on ideas that can help solve the issue rather than lecture about best practices or propose architectural changes.

I know this is kind of obvious but sometimes rewording error messages helps us better understand what's going on so bear with me.

The error message indicates entities are being used by multiple data contexts which indicates that there are multiple dbcontext instances and that entities are referenced by more than one of such instances.

Then the question states there is a data context per thread that used to be per http request and that entities are cached.

So it seems safe to assume entities read from a db context upon a cache miss and returned from the cache on a hit. Attempting to update entities loaded from one db context instance using a second db context instance cause the failure. We can conclude that in this case the exact same entity instance was used in both operations and no serialization/deserialization is in place for accessing the cache.

DbContext instances are in themselves entity caches through their internal change tracker mechanism and this error is a safeguard protecting its integrity. Since the idea is to have a long running process handling simultaneous requests through multiple db contexts (one per thread) plus a shared entity cache it would be very beneficial performance-wise and memory-wise (the change tracking would likely increase memory consumption in time) to attempt to either change db contexts lifecycle to be per message or empty their change tracker after each message is processed.

Of course in order to process entity updates they need to be attached to the current db context right after retrieving it from the cache and before any changes are applied to them.

Tourcoing answered 8/1, 2020 at 16:39 Comment(1)
Thanks for the insights, I do agree with you, however the problem here remains. My current workaround will be to skip "repository" and work directly with data context...Lathrope

© 2022 - 2024 — McMap. All rights reserved.