Entity Framework and Parallelism
Asked Answered
B

1

14

Background

I have an application that receives periodic data dumps (XML files) and imports them into an existing database using Entity Framework 5 (Code First). The import happens via EF5 rather than say BULK INSERT or BCP because business rules that already exist in the entities must be applied.

Processing seems to be CPU bound in the application itself (the extremely fast, write-cache enabled disk IO subsystem shows almost zero disk wait time throughout the process, and SQL Server shows no more than 8%-10% CPU time).

To improve efficiency, I built a pipeline using TPL Dataflow with components to:

Read & Parse XML file
        |
        V
Create entities from XML Node
        |
        V
Batch entities (BatchBlock, currently n=200)
        |
        V
Create new DbContext / insert batched entities / ctx.SaveChanges()

I see a substantial increase in performance by doing this, but can not get the CPU above about 60%.

Analysis

Suspecting some sort of resource contention, I ran the process using the VS2012 Profiler's Resource contention data (concurrency) mode.

The profiler shows me 52% contention for a resource labeled Handle 2. Drilling in, I see that the method creating the most contention for Handle 2 is

System.Data.Entity.Internal.InternalContext.SaveChanges()

Second place, at about 40% as many contentions as SaveChanges(), is

System.Data.Entity.DbSet`1.Add(!0)

Questions

  • How can I figure out what Handle 2 really is (e.g. part of TPL, part of EF)?
  • Does EF throttle calls to separate DbContext instances from separate threads? It seems there is a shared resource they are contending for.
  • Is there anything that I can do to improve parallelism in this case?

UPDATE

For the run in question, the maximum degree of parallelism for the task that calls SaveChanges is set to 12 (I tried various values including Unbounded in previous runs).

UPDATE 2

Microsoft's EF team has provided feedback. See my answer for a summary.

Binion answered 1/11, 2012 at 17:41 Comment(9)
Are you sure you aren't waiting on connections for the pool? Have you tried making your connection pool size larger?Pentarchy
@Maess: For the run in question, I set the maximum degree of parallelism to 12. If I understand correctly, the default maximum size of the connection pool is 100. Still, I'll try explicitly setting it higher.Binion
@Maess: Perfmon shows only 11 logical connections and 11 user connections to the SQL instance, far under the connection pool limit.Binion
It would be really great if we could get a repro for this. I have some tentative ideas as to which lock might be causing the problem but it's hard to know for sure without a repro. EF is certainly not doing any intentional throttling, but it does use locks in some places for accessing shared metadata and one of these could be causing a problem. My email is avickers at you know where.Hopson
@ArthurVickers: I'll see if I can pull together a simple repro. Separately, are EF 5 sources available? I see they are being published for EF 6. With the EF 5 sources, I could find the lines of code causing the issue in my complete solution.Binion
The EF5 sources are not currently available. I can look into what it would take to get them published, but it isn't likely to happen any time soon.Hopson
Chances are it has to do with change tracking; when you add and when you save the changes, it has to update change tracking. Are you using POCOs or objects that implement INotifyPropertyChanged?Secession
This issue is now being tracked by Microsoft entityframework.codeplex.com/workitem/636Binion
@Maess: Turns out the issue is related to contention for the network read buffer in System.Data.dll. I provided an answer with more details.Binion
B
5

The following summarizes my interaction with the Entity Framework team on this issue. I'll update the answer if more information becomes available

  • The issue can be reproduced at Microsoft.
  • The handle contention is related to Network I/O (even with SQL Server on localhost). Specifically, there is contention for the reading buffer for Network I/O in System.Data.dll.
  • The EF team is now working with the SQL Connectivity team to better understand the issue.
  • There is as yet no guidance from Microsoft on how to minimize the impact of this contention.

UPDATE

This issue is now being tracked on CodePlex:

http://entityframework.codeplex.com/workitem/636?PendingVoteId=636

Binion answered 12/11, 2012 at 19:24 Comment(2)
Thanks a lot Eric. I am quite interested in this coz I have a similiar scenario. Do we have an issue for this on connect.microsoft.com so that we can trace its progress?Rovelli
@Dodd: It is being tracked on Codeplex since EF is open source now (but still worked by the team at Microsoft). Added the link.Binion

© 2022 - 2024 — McMap. All rights reserved.