Fastest Way of Inserting in Entity Framework
Asked Answered
G

32

808

I'm looking for the fastest way of inserting in Entity Framework.

I'm asking this because of the scenario where you have an active TransactionScope and the insertion is huge (4000+). It can potentially last more than 10 minutes (default timeout of transactions), and this will lead to an incomplete transaction.

Graduate answered 9/5, 2011 at 17:14 Comment(8)
How are you currently doing it?Abridge
Creating the TransactionScope, instantiating the DBContext, Opening the connection, and in a for-each statement doing the insertions and SavingChanges (for each record) , NOTE: TransactionScope and DBContext are in using statements, and i'm closing the connection in a finally blockGraduate
Another answer for reference: #5799146Castorena
Setting AutoDetectChangesEnabled to false could negatively affect applications that depend heavily on third party controls. Something to have in mind before rushing and breaking other parts of the application.Faustino
Awesome solutions can be seen here for on of the basic web task - persisting an array of rows with data.... Amazing framework...Brnaba
The fastest way of inserting into a SQL database does not involve EF. AFAIK Its BCP then TVP+Merge/insert.Hypozeuxis
For those who will read comments: Most applicable, modern answer is here.Lobe
Take a look here ben-morris.com/optimising-bulk-inserts-with-entity-framework-6 as wellTakao
O
1126

To your remark in the comments to your question:

"...SavingChanges (for each record)..."

That's the worst thing you can do! Calling SaveChanges() for each record slows bulk inserts extremely down. I would do a few simple tests which will very likely improve the performance:

  • Call SaveChanges() once after ALL records.
  • Call SaveChanges() after for example 100 records.
  • Call SaveChanges() after for example 100 records and dispose the context and create a new one.
  • Disable change detection

For bulk inserts I am working and experimenting with a pattern like this:

using (TransactionScope scope = new TransactionScope())
{
    MyDbContext context = null;
    try
    {
        context = new MyDbContext();
        context.Configuration.AutoDetectChangesEnabled = false;

        int count = 0;            
        foreach (var entityToInsert in someCollectionOfEntitiesToInsert)
        {
            ++count;
            context = AddToContext(context, entityToInsert, count, 100, true);
        }

        context.SaveChanges();
    }
    finally
    {
        if (context != null)
            context.Dispose();
    }

    scope.Complete();
}

private MyDbContext AddToContext(MyDbContext context,
    Entity entity, int count, int commitCount, bool recreateContext)
{
    context.Set<Entity>().Add(entity);

    if (count % commitCount == 0)
    {
        context.SaveChanges();
        if (recreateContext)
        {
            context.Dispose();
            context = new MyDbContext();
            context.Configuration.AutoDetectChangesEnabled = false;
        }
    }

    return context;
}

I have a test program which inserts 560.000 entities (9 scalar properties, no navigation properties) into the DB. With this code it works in less than 3 minutes.

For the performance it is important to call SaveChanges() after "many" records ("many" around 100 or 1000). It also improves the performance to dispose the context after SaveChanges and create a new one. This clears the context from all entites, SaveChanges doesn't do that, the entities are still attached to the context in state Unchanged. It is the growing size of attached entities in the context what slows down the insertion step by step. So, it is helpful to clear it after some time.

Here are a few measurements for my 560000 entities:

  • commitCount = 1, recreateContext = false: many hours (That's your current procedure)
  • commitCount = 100, recreateContext = false: more than 20 minutes
  • commitCount = 1000, recreateContext = false: 242 sec
  • commitCount = 10000, recreateContext = false: 202 sec
  • commitCount = 100000, recreateContext = false: 199 sec
  • commitCount = 1000000, recreateContext = false: out of memory exception
  • commitCount = 1, recreateContext = true: more than 10 minutes
  • commitCount = 10, recreateContext = true: 241 sec
  • commitCount = 100, recreateContext = true: 164 sec
  • commitCount = 1000, recreateContext = true: 191 sec

The behaviour in the first test above is that the performance is very non-linear and decreases extremely over time. ("Many hours" is an estimation, I never finished this test, I stopped at 50.000 entities after 20 minutes.) This non-linear behaviour is not so significant in all other tests.

Otherworld answered 9/5, 2011 at 20:33 Comment(48)
I was not aware of the penalty in performance of calling the SaveChanges after each insertion, i'll try with the options you provided, Thanks for your comment, i'll let you know my results.Graduate
I left a loadtest running with this approach, i'll let you know the results of it in the morning, but so far i did noticed a HUGE improvement by doing the SaveChanges every 100 recordsGraduate
@Bongo Sharp: Don't forget to set AutoDetectChangesEnabled = false; on the DbContext. It also has a big additional performance effect: #5943894Otherworld
Yeah, the problem is that i'm using Entity Framework 4, and AutoDetectChangesEnabled is part of the 4.1, nevertheless, i did the performance test and i had AMAZING RESULTS, it went from 00:12:00 to 00:00:22 SavinChanges on each entity was doing the olverload... THANKS so much for your answare! this is what i was looking forGraduate
Thank you for the context.Configuration.AutoDetectChangesEnabled = false; tip, it makes a huge difference.Withindoors
Hi I m trying to use this code. But it gives me error saying AutoDetectChangesEnabled cannot be found.Mckissick
@dahacker89: Are you using the correct version EF >= 4.1 and DbContext, NOT ObjectContext?Otherworld
Thanks for correcting me. Yes I am using ObjectContext. Just converted my model to DBContext. I have another problem now. This line: using (TransactionScope scope = new TransactionScope()) is giving me error saying that TransactionScope used in a using statement must be implicitly convertible to System.IDisposable. Any idea how to solve this???Mckissick
@dahacker89: Check the correct namespace. You must use the TransactionScope class in System.Transactions which is disposable: msdn.microsoft.com/en-us/library/…Otherworld
@Otherworld I did include that namespace. It does recognize but keeps giving me that error. And also it says scope.Complete() cannot be found or something. So I took TransactionScope and ran my code. But I still want to test it out though :(Mckissick
@dahacker89: I suggest that you create a separate question for your problem with perhaps more details. I'm not able to figure out here what is wrong.Otherworld
I set the IsolationLevel for my transaction like so: using (TransactionScope trans = new TransactionScope(TransactionScopeOption.Required, new TransactionOptions { IsolationLevel = IsolationLevel.ReadUncommitted })) - I'm not sure if it's necessary, but I'm really hoping to avoid any deadlocks or long wait times while the data inserts.Binns
Sorry... that should be TransactionScopeOption.Suppress - see also: #2922420Binns
Just quickly, fantastic information here. Thanks Sluama! What's the use of the TransactionScope though? I've never used one before when dealing with EF and insert/update/deletes.Obliquely
@FizzBuzz: It wraps multiple SaveChanges calls into a single database transaction. In most cases it's not needed.Otherworld
About context recreating: if you create objects in new context and set their references to objects that where created and saved in previous context - reference objects would be added to new context with EntityState.Added and on SaveChanges context would try to save objects second time regardless of primarykeyid>0Donee
Strewth- simply turning off the AutoDetectChangesEnabled made a MASSIVE difference to my app, which adds 200,000 records to a collection before hitting one call to save changes! I previously stopped the app after 30 mins, now it whips through them all in no time at all!Doublespace
Wow, what a nice job. It improved my adding with lightningspeed.Coulomb
Can you, please, look through my question: #18646057. Thanks!Truong
the type Entity in AddToContext() is it some sort of base entity type ? What I am looking for is generic AddToContext() where I can pass different types of entities and use gettype() or something in DBSet. I am using EF6 Db first model.Either ways its a great answerWoe
@Neel: It should be possible to use Entity directly as a generic parameter of a generic version AddToContext<Entity>(...) where Entity : class.Otherworld
@Otherworld yes, that's how I got it done, generics and some help from Reflection.Woe
@Otherworld would this code and results be similar if nested/relational entities were involved? Does the 'AutoDetectChangesEnabled' affect the relationships?Fontaine
@Cristi: AutoDetectChangesEnabled affects related entities as well, yes, and probably also with better performance. I don't have concrete numbers though. However, you must be careful with disabling change detection and test the code more thoroughly if relationships are involved: #23637420Otherworld
Are the insertion of all the records put in one huge transaction in the example? Since the using TransactionScope is at the outermost?Subdual
@dc7a9163d9: Yes, it's one transaction.Otherworld
@slauma so it may cause problem if the code is used for inserting millions of rowsSubdual
I wish I had learned about this stuff like two years ago when we were starting on the project I have and not now when it's extremely painful to account for it.Heptode
This code works pretty well for AddOrUpdate as well. It's not great, but it's better than nothing. Long term I think I would want to change it to a MERGE.Kirchner
Unfortunately, the answer does not specify the negative implications to setting AutoDetectChangesEnabled to false. Setting it to false can break apps, and that is why it is a bad practice; otherwise, it would have been set to false by default. Possible break down could happen if the application uses third party controls.Faustino
Since Bulk insert is an extension method of DbContext and not DbSet Can it update multiple sets at the same time ?Melano
Has anyone noticed a very recent improvement in performance in the scenario of inserting data via E.F. into Azure SQL? I noticed better performance recently without me having changed anything on the Azure SQL specs.Irresoluble
Will this be edited for the latest versions of .net and ef? I'm curious to find out if the newest version of ef can handle 1000 rows fasterGarcia
I think the underlying connection is opened and closed every time you re-create the context, right? Would it not be better to create a connection manually and pass it to the context? This way, the connection would not be opened and closed per context instantiation.Teilo
Does anyone know if it is possible to apply this approach in the Entity Framework Core?Forget
Man this made my code so much faster I have 600,000 items to put into the database and this seems to do it in about 20 minutes where as before it was taking hours.Pennoncel
Can you explain why recreating the context helps so much? Also, how did you came up with this idea??Concede
In my real world scenario, only saving after every 1000 records, rebuilding the context, and disabling AutoDetectChanges leads to a solid 10x speed improvement. FYI to others, AutoDetectChanges becomes more and more costly the more items that are cached in the context, and thus it degrades to become worse and worse performing the more items you insert.Dishearten
What if error occurs in one record of 100 , does it discards all 100??Bish
I've used this method for an array of ~40K Entities with 90 Fields, saving and renewing after 1000 iterations. Takes about a minute for the batch to complete. Nice work. Still not fast enough, but it may be meDibs
great summary, I was batching, killing the context and setting the configuration, but after using Set<Entity> increased my speed by even 3 times more!Lottielotto
I am getting Network access for Distributed Transaction Manager (MSDTC) has been disabled. Please enable DTC for network access in the security configuration for MSDTC using the Component Services Administrative tool. exception.Passable
it would be usefull adding timeout spec to the TransactionScope object extending where occurs: using (TransactionScope scope = new TransactionScope( TransactionScopeOption.Required, new System.TimeSpan(0, 30, 0))) I also having time long lasting issue with this solution... as soon as the attempts are finished I'll let you know how long it tookTrapezium
@Otherworld recreating context causes a big issue. It duplicates the first commit block at the end of the transaction.Trapezium
I was able to solve the problem mentioned in the previous comment and finally do a bulk insert test without vpn. I confirm that with 200K data records consisting of 43 fields and each record of about 500 characters (in Sql Server nvarchars corresponding at 1000B per record) the system took 20 minutes (10K rec/min).Trapezium
(it continues previuous...) That's quite far from the time reported in the OP, but comparing the entitis sizes as well I think it's a good speed for EF.Trapezium
you can use context.ChangeTracker.Clear() instead of generating a new context as mentioned here https://mcmap.net/q/28999/-how-do-i-clear-tracked-entities-in-entity-frameworkFactoring
In my case it takes 0.75-1 hour or even more to process EVEN IF number of records-to-be-inserted + updated is 2-4 thousands... I've tried to use test version of the commercial EfCore extension for BULK operation and the time reduced to 1 min or so. I am now looking for reliable FREE bulk cross-DB solution which supports Oracle..Pomeranian
H
191

This combination increase speed well enough.

context.Configuration.AutoDetectChangesEnabled = false;
context.Configuration.ValidateOnSaveEnabled = false;
Harbinger answered 12/8, 2012 at 15:52 Comment(9)
This also made a different of at least an order of magnitude for me, when inserting around 40,000 records in a table.Frankforter
I had to add around 150,000 entities and it took forever. I've tried many approaches, but this one is the simplest and the best! Thanks.Bummalo
Saving 23000 records, and this is the easiest way to make the performance acceptable.Annora
Don't blindly disable ValidateOnSaveEnabled you may be depending on that behavior, and not realize it until it's too late. Then again you may be performing validation elsewhere in code and having EF validate yet again is completely unnecessary.Blenny
In my test saving 20.000 rows went down from 101 seconds to 88 seconds. Not a lot and what are the implications.Twocolor
@JeremyCook I think what you're trying to get at is this answer would be much better if it explained the possible implications of changing these properties from their default values (aside from performance improvement). I agree.Periphrastic
This worked for me, although if you're updating records in the context you will need to call DetectChanges() explicitlyLargess
These can be disabled and then re-enabled with a try-finally block: msdn.microsoft.com/en-us/data/jj556205.aspxClaimant
Saving about 350,000 records to a new, empty database, this made zero difference. So, you just need to test your own scenario.Farnese
J
99

as it was never mentioned here I want to recomment EFCore.BulkExtensions here

context.BulkInsert(entitiesList);                 context.BulkInsertAsync(entitiesList);
context.BulkUpdate(entitiesList);                 context.BulkUpdateAsync(entitiesList);
context.BulkDelete(entitiesList);                 context.BulkDeleteAsync(entitiesList);
context.BulkInsertOrUpdate(entitiesList);         context.BulkInsertOrUpdateAsync(entitiesList);         // Upsert
context.BulkInsertOrUpdateOrDelete(entitiesList); context.BulkInsertOrUpdateOrDeleteAsync(entitiesList); // Sync
context.BulkRead(entitiesList);                   context.BulkReadAsync(entitiesList);
Jacintajacinth answered 22/2, 2019 at 8:32 Comment(10)
I second this suggestion. After trying many homebrew solutions this cut my insert down to 1 second from over 50 seconds. And, it's MIT license so easy to incorporate.Colombes
is this avail for ef 6.xTrixy
this is only more performant than using AddRange if it's over 10 entitiesAlmsman
10 000 inserts went from 9 minutes to 12 seconds. This deserves more attention!Supersonic
If there's any way to change accepted answers, this should be the modern accepted answer now. And I wish EF team provided this out of box.Lobe
Incredibly fast >30second insert on 13k rows down to less than a second, well chuffed.Uncanny
there is a 4.5 framework of this version?Priestess
430k inserts from 5 min to 17 secs. I also made use of TruncateAsync(). I wish I had known about this 2 years ago.Copolymer
Heads-up: it might have been MIT-licensed back in 2019 but right now it uses a dual license, depending on your use: see github.com/borisdj/EFCore.BulkExtensions/blob/master/…Oxtail
@LeandroBardelli Yes, there is. Look up System.Data.SqlClient.SqlBulkCopy. Not simple, but fast.Crossfertilize
W
94

You should look at using the System.Data.SqlClient.SqlBulkCopy for this. Here's the documentation, and of course there are plenty of tutorials online.

Sorry, I know you were looking for a simple answer to get EF to do what you want, but bulk operations are not really what ORMs are meant for.

Wilsey answered 9/5, 2011 at 17:17 Comment(5)
I have run into the SqlBulkCopy a couple of times while researching this, but it seems to be more oriented to table-to-table inserts, saddly i was not expecting easy solutions, but rather performance tips, like for example managing the State of the connection manually, insted of letting EF do it for youGraduate
I've used SqlBulkCopy to insert large amounts of data right from my application. You basically have to create a DataTable, fill it up, then pass that to BulkCopy. There are a few gotchas as you're setting up your DataTable (most of which I've forgotten, sadly), but it should work just fineWilsey
I did the proof of concept, and as promissed, it works really fast, but one of the reasons why i'm using EF is becuase the insertion of relational data is easier, Eg if i'm insert an entity that already contains relational data, it will also insert it, have you ever got into this scenario? Thanks!Graduate
Unfortunately inserting a web of objects into a DBMS is not really something BulkCopy will do. That's the benefit of an ORM like EF, the cost being that it won't scale to do hundreds of similar object graphs efficiently.Wilsey
SqlBulkCopy is definitely the way to go if you need raw speed or if you will be re-running this insert. I've inserted several million records with it before and it is extremely fast. That said, unless you will need to re-run this insert, it might be easier to just use EF.Soporific
L
87

The fastest way would be using bulk insert extension, which I developed

note: this is a commercial product, not free of charge

It uses SqlBulkCopy and custom datareader to get max performance. As a result it is over 20 times faster than using regular insert or AddRange EntityFramework.BulkInsert vs EF AddRange

usage is extremely simple

context.BulkInsert(hugeAmountOfEntities);
Lettielettish answered 9/5, 2011 at 17:14 Comment(23)
My Context does not have a .BulkInsert why would this be/ Using EF 6.1Otilia
A cool wrapper/tool, but the standing drawback of BulkCopy is that it doesn't support upserts directly - and I really don't want to add a staging table to my staging tables.Diminished
Fast but only does the top layer of a hierarchy.Farnese
I tried to use this solution, but the order of columns mapped in bulk update is different to actual, so if I have columns in db as ABCD and corresponding object has data for objects ABCD, the values get mis mapped to something like BCAD... do you know what that might be ?Massa
@Massa I also found a couple of bugs with this library and the MappingAPI library it uses. I had better success with the EFUtilities library from this other answer. It also has other utilities including for batch updates.Passage
This is amazing, I went from minutes to a few seconds! I was using AddRange() in EF 6, even with change detection disabled, it was still minutes to do just 1200 records with 110 columns. Great work, saved me writing my own.Titus
Do you need to call SaveChanges after running BulkInsert, or does BulkInsert operate directly on the database? For example, I add a "Message" record and call SaveChanges, because the message ID will be used as a foreign key when BulkInsert is called to add user inbox entries that all reference the message. So I do: Add Message, call SaveChanges to commit the new message, call BulkInsert to add inbox entires. And I'm wondering if I need another call to SaveChanges after BulkInsert or if BulkInsert commits directly to the database such that SaveChanges doesn't need called afterwards.Lammond
@Triynko. No need to call SaveChanges. Usage instructions are here efbulkinsert.codeplex.comLettielettish
I have used this but it raised an error "The given key was not present in the dictionary" my case is list of entities with relation but I am trying to insert without the relation objectHectorhecuba
@muds there is a problem in bulk update, The order of the properties in your entity class should be the same as the order of the columns in your database table.Effortful
No way, this improved my codes to insert 400k+ rows (with 5 columns and no index restrictions) from 60+ mins (using the technique from the accepted answer) to 10.2 seconds. You just saved my life mate!Irriguous
It is not free.Indivisible
Ads are getting smarter... this is paid product and very expensive for a freelance. Be warned!Stinker
sadly this is no longer FOSS, at some point the owner moved it to another domain and started charging for the bulk insert version. It's no longer a suitable option :-(Quinquepartite
USD600 for 1-year support and upgrades? Are you out of your mind?Spandrel
im not the owner of the product any longerLettielettish
There are several forks of BulkInsert that can be found on GitHub from before the codeplex source was bought by ZZZ Projects. This one can even be found on NuGet.Incomplete
Hugely over priced for this functionality.Feathers
It is free, and I released an updated version from the original source code under the original license before ZZZ took over and made it pay-for-play. I got the source code from the original project on the original site (no longer exists, but I saved screenshots and all the licensing info). WayBackMaching FTW! I fixed a serious bug in the MappingAPI with how it handles TPH types, so it can now handle shared properties in a complex TPH scheme. I also added support for async operations (i.e. WriteToServerAsync) and added the BulkInsertAsync methods. I also added support for TPC types. :)Lammond
I also updated the way it manages connections. It was creating it's own SqlConnection (from the DbContext's Database.Connection.ConnectionString), but I changed it so it can reusing the existing connection. The problem with allowing it to open a second connection of its own, is if you're running it in a transaction, it will escalate to a distributed transaction, and the bulk insert takes a whole second to insert a few rows, defeating the whole purpose. So there's that.Lammond
@Triynko, does it has a Git?Geosyncline
If you're using EF Core, this exists. github.com/borisdj/EFCore.BulkExtensionsCapybara
This one is much better, free and open source: entityframework-plus.netOvate
C
56

I agree with Adam Rackis. SqlBulkCopy is the fastest way of transferring bulk records from one data source to another. I used this to copy 20K records and it took less than 3 seconds. Have a look at the example below.

public static void InsertIntoMembers(DataTable dataTable)
{           
    using (var connection = new SqlConnection(@"data source=;persist security info=True;user id=;password=;initial catalog=;MultipleActiveResultSets=True;App=EntityFramework"))
    {
        SqlTransaction transaction = null;
        connection.Open();
        try
        {
            transaction = connection.BeginTransaction();
            using (var sqlBulkCopy = new SqlBulkCopy(connection, SqlBulkCopyOptions.TableLock, transaction))
            {
                sqlBulkCopy.DestinationTableName = "Members";
                sqlBulkCopy.ColumnMappings.Add("Firstname", "Firstname");
                sqlBulkCopy.ColumnMappings.Add("Lastname", "Lastname");
                sqlBulkCopy.ColumnMappings.Add("DOB", "DOB");
                sqlBulkCopy.ColumnMappings.Add("Gender", "Gender");
                sqlBulkCopy.ColumnMappings.Add("Email", "Email");

                sqlBulkCopy.ColumnMappings.Add("Address1", "Address1");
                sqlBulkCopy.ColumnMappings.Add("Address2", "Address2");
                sqlBulkCopy.ColumnMappings.Add("Address3", "Address3");
                sqlBulkCopy.ColumnMappings.Add("Address4", "Address4");
                sqlBulkCopy.ColumnMappings.Add("Postcode", "Postcode");

                sqlBulkCopy.ColumnMappings.Add("MobileNumber", "MobileNumber");
                sqlBulkCopy.ColumnMappings.Add("TelephoneNumber", "TelephoneNumber");

                sqlBulkCopy.ColumnMappings.Add("Deleted", "Deleted");

                sqlBulkCopy.WriteToServer(dataTable);
            }
            transaction.Commit();
        }
        catch (Exception)
        {
            transaction.Rollback();
        }

    }
}
Carolinecarolingian answered 15/1, 2013 at 10:27 Comment(4)
I tried many of the solutions provided in this post and SqlBulkCopy was by far the fastest. Pure EF took 15min, but with a mix of the solution and SqlBulkCopy I was able to get down to 1.5 min! This was with 2 million records! Without any DB index optimization.Meit
List is easier than DataTable. There's an AsDataReader() extension method, explained in this answer: https://mcmap.net/q/29000/-sqlbulkcopy-from-a-list-lt-gtPhenology
But its only for top Entity not relational oneTierell
@ZahidMustafa: yeah. It's doing BulkInsert, not Bulk-Analysis-And-Relation-Tracing-On-Object-Graphs.. if you want to cover relations, you have to analyze and determine insertion order and then bulk-insert individual levels and maybe update some keys as needed, and you will get speedy custom tailored solution. Or, you can rely on EF to do that, no work on your side, but slower at runtime.Underpart
F
29

[2019 Update] EF Core 3.1

Following what have been said above, disabling AutoDetectChangesEnabled in EF Core worked perfectly: the insertion time was divided by 100 (from many minutes to a few seconds, 10k records with cross tables relationships)

The updated code is :

context.ChangeTracker.AutoDetectChangesEnabled = false;
foreach (IRecord record in records) {
    //Add records to your database        
}
context.ChangeTracker.DetectChanges();
context.SaveChanges();
context.ChangeTracker.AutoDetectChangesEnabled = true; //do not forget to re-enable
Foster answered 10/2, 2020 at 11:27 Comment(1)
Why re-enable context.ChangeTracker.AutoDetectChangesEnabled = true if context now should be disposed??Haukom
M
27

I would recommend this article on how to do bulk inserts using EF.

Entity Framework and slow bulk INSERTs

He explores these areas and compares perfomance:

  1. Default EF (57 minutes to complete adding 30,000 records)
  2. Replacing with ADO.NET Code (25 seconds for those same 30,000)
  3. Context Bloat- Keep the active Context Graph small by using a new context for each Unit of Work (same 30,000 inserts take 33 seconds)
  4. Large Lists - Turn off AutoDetectChangesEnabled (brings the time down to about 20 seconds)
  5. Batching (down to 16 seconds)
  6. DbTable.AddRange() - (performance is in the 12 range)
Mezcaline answered 20/6, 2014 at 3:10 Comment(0)
B
22

I've investigated Slauma's answer (which is awesome, thanks for the idea man), and I've reduced batch size until I've hit optimal speed. Looking at the Slauma's results:

  • commitCount = 1, recreateContext = true: more than 10 minutes
  • commitCount = 10, recreateContext = true: 241 sec
  • commitCount = 100, recreateContext = true: 164 sec
  • commitCount = 1000, recreateContext = true: 191 sec

It is visible that there is speed increase when moving from 1 to 10, and from 10 to 100, but from 100 to 1000 inserting speed is falling down again.

So I've focused on what's happening when you reduce batch size to value somewhere in between 10 and 100, and here are my results (I'm using different row contents, so my times are of different value):

Quantity    | Batch size    | Interval
1000    1   3
10000   1   34
100000  1   368

1000    5   1
10000   5   12
100000  5   133

1000    10  1
10000   10  11
100000  10  101

1000    20  1
10000   20  9
100000  20  92

1000    27  0
10000   27  9
100000  27  92

1000    30  0
10000   30  9
100000  30  92

1000    35  1
10000   35  9
100000  35  94

1000    50  1
10000   50  10
100000  50  106

1000    100 1
10000   100 14
100000  100 141

Based on my results, actual optimum is around value of 30 for batch size. It's less than both 10 and 100. Problem is, I have no idea why is 30 optimal, nor could have I found any logical explanation for it.

Beadledom answered 26/3, 2013 at 8:46 Comment(2)
I found the same with Postrges and pure SQL (it's depends on SQL not on EF) that 30 is optimal.Beaune
My experience is that optimum differs for different connection speed and size of row. For fast connection and small rows optimum can be even >200 rows.Deflect
D
22

As other people have said SqlBulkCopy is the way to do it if you want really good insert performance.

It's a bit cumbersome to implement but there are libraries that can help you with it. There are a few out there but I will shamelesslyplug my own library this time: https://github.com/MikaelEliasson/EntityFramework.Utilities#batch-insert-entities

The only code you would need is:

 using (var db = new YourDbContext())
 {
     EFBatchOperation.For(db, db.BlogPosts).InsertAll(list);
 }

So how much faster is it? Very hard to say because it depends on so many factors, computer performance, network, object size etc etc. The performance tests I've made suggests 25k entities can be inserted at around 10s the standard way on localhost IF you optimize your EF configuration like mentioned in the other answers. With EFUtilities that takes about 300ms. Even more interesting is that I have saved around 3 millions entities in under 15 seconds using this method, averaging around 200k entities per second.

The one problem is ofcourse if you need to insert releated data. This can be done efficently into sql server using the method above but it requires you to have an Id generation strategy that let you generate id's in the app-code for the parent so you can set the foreign keys. This can be done using GUIDs or something like HiLo id generation.

Dido answered 19/12, 2014 at 10:44 Comment(5)
Works well. The syntax is a bit verbose though. Think it would be better if EFBatchOperation had a constructor which you pass in the DbContext to rather than passing to every static method. Generic versions of InsertAll and UpdateAll which automatically find the collection, similar to DbContext.Set<T>, would be good too.Passage
Just a quick comment to say thanks! This code allowed me to save 170k records in 1.5 seconds! Completely blows any other method I've tried out of the water.Iona
@Mikael One issue is dealing with identity fields. Do you have a way to enable identity insert yet?Handcraft
In contrast to EntityFramework.BulkInsert, this library remained free. +1Incomplete
Is it applicable for EF Core?Simms
Z
16

Dispose() context create problems if the entities you Add() rely on other preloaded entities (e.g. navigation properties) in the context

I use similar concept to keep my context small to achieve the same performance

But instead of Dispose() the context and recreate, I simply detach the entities that already SaveChanges()

public void AddAndSave<TEntity>(List<TEntity> entities) where TEntity : class {

const int CommitCount = 1000; //set your own best performance number here
int currentCount = 0;

while (currentCount < entities.Count())
{
    //make sure it don't commit more than the entities you have
    int commitCount = CommitCount;
    if ((entities.Count - currentCount) < commitCount)
        commitCount = entities.Count - currentCount;

    //e.g. Add entities [ i = 0 to 999, 1000 to 1999, ... , n to n+999... ] to conext
    for (int i = currentCount; i < (currentCount + commitCount); i++)        
        _context.Entry(entities[i]).State = System.Data.EntityState.Added;
        //same as calling _context.Set<TEntity>().Add(entities[i]);       

    //commit entities[n to n+999] to database
    _context.SaveChanges();

    //detach all entities in the context that committed to database
    //so it won't overload the context
    for (int i = currentCount; i < (currentCount + commitCount); i++)
        _context.Entry(entities[i]).State = System.Data.EntityState.Detached;

    currentCount += commitCount;
} }

wrap it with try catch and TrasactionScope() if you need, not showing them here for keeping the code clean

Zollie answered 27/2, 2013 at 20:38 Comment(4)
That slowed down the insert (AddRange) using Entity Framework 6.0. Inserting 20.000 rows went up from about 101 seconds to 118 seconds.Twocolor
@Stephen Ho: I am also trying to avoid disposing my context. I can understand this is slower than recreating the context, but I want to know if you found this faster enough than not recreating the context but with a commitCount set.Fontaine
@Learner: I think it was faster than recreate the context. But I don't really remember now cos I switched to use SqlBulkCopy at last.Zollie
I ended up having to use this technique because, for some weird reason, there was some left over tracking occurring on the second pass through the while loop, even though I had everything wrapped in a using statement and even called Dispose() on the DbContext. When I would add to the context (on the 2nd pass) the context set count would jump to 6 instead of just one. The other items that got arbitrarily added had already been inserted in the first pass through the while loop so the call to SaveChanges would fail on the second pass (for obvious reasons).Chou
A
15

I know this is a very old question, but one guy here said that developed an extension method to use bulk insert with EF, and when I checked, I discovered that the library costs $599 today (for one developer). Maybe it makes sense for the entire library, however for just the bulk insert this is too much.

Here is a very simple extension method I made. I use that on pair with database first (do not tested with code first, but I think that works the same). Change YourEntities with the name of your context:

public partial class YourEntities : DbContext
{
    public async Task BulkInsertAllAsync<T>(IEnumerable<T> entities)
    {
        using (var conn = new SqlConnection(Database.Connection.ConnectionString))
        {
            await conn.OpenAsync();

            Type t = typeof(T);

            var bulkCopy = new SqlBulkCopy(conn)
            {
                DestinationTableName = GetTableName(t)
            };

            var table = new DataTable();

            var properties = t.GetProperties().Where(p => p.PropertyType.IsValueType || p.PropertyType == typeof(string));

            foreach (var property in properties)
            {
                Type propertyType = property.PropertyType;
                if (propertyType.IsGenericType &&
                    propertyType.GetGenericTypeDefinition() == typeof(Nullable<>))
                {
                    propertyType = Nullable.GetUnderlyingType(propertyType);
                }

                table.Columns.Add(new DataColumn(property.Name, propertyType));
            }

            foreach (var entity in entities)
            {
                table.Rows.Add(
                    properties.Select(property => property.GetValue(entity, null) ?? DBNull.Value).ToArray());
            }

            bulkCopy.BulkCopyTimeout = 0;
            await bulkCopy.WriteToServerAsync(table);
        }
    }

    public void BulkInsertAll<T>(IEnumerable<T> entities)
    {
        using (var conn = new SqlConnection(Database.Connection.ConnectionString))
        {
            conn.Open();

            Type t = typeof(T);

            var bulkCopy = new SqlBulkCopy(conn)
            {
                DestinationTableName = GetTableName(t)
            };

            var table = new DataTable();

            var properties = t.GetProperties().Where(p => p.PropertyType.IsValueType || p.PropertyType == typeof(string));

            foreach (var property in properties)
            {
                Type propertyType = property.PropertyType;
                if (propertyType.IsGenericType &&
                    propertyType.GetGenericTypeDefinition() == typeof(Nullable<>))
                {
                    propertyType = Nullable.GetUnderlyingType(propertyType);
                }

                table.Columns.Add(new DataColumn(property.Name, propertyType));
            }

            foreach (var entity in entities)
            {
                table.Rows.Add(
                    properties.Select(property => property.GetValue(entity, null) ?? DBNull.Value).ToArray());
            }

            bulkCopy.BulkCopyTimeout = 0;
            bulkCopy.WriteToServer(table);
        }
    }

    public string GetTableName(Type type)
    {
        var metadata = ((IObjectContextAdapter)this).ObjectContext.MetadataWorkspace;
        var objectItemCollection = ((ObjectItemCollection)metadata.GetItemCollection(DataSpace.OSpace));

        var entityType = metadata
                .GetItems<EntityType>(DataSpace.OSpace)
                .Single(e => objectItemCollection.GetClrType(e) == type);

        var entitySet = metadata
            .GetItems<EntityContainer>(DataSpace.CSpace)
            .Single()
            .EntitySets
            .Single(s => s.ElementType.Name == entityType.Name);

        var mapping = metadata.GetItems<EntityContainerMapping>(DataSpace.CSSpace)
                .Single()
                .EntitySetMappings
                .Single(s => s.EntitySet == entitySet);

        var table = mapping
            .EntityTypeMappings.Single()
            .Fragments.Single()
            .StoreEntitySet;

        return (string)table.MetadataProperties["Table"].Value ?? table.Name;
    }
}

You can use that against any collection that inherit from IEnumerable, like that:

await context.BulkInsertAllAsync(items);
Avesta answered 20/1, 2018 at 2:36 Comment(6)
please complete your example code. where is bulkCopyBuoyage
It is already here: await bulkCopy.WriteToServerAsync(table);Avesta
Maybe i wasn't clear, in your write up, you suggest you made an extension...which i took to mean that no 3rd part lib was needed, when in fact in both methods use SqlBulkCopy lib. This entirely relies on SqlBulkCopy, when why i asked where does bulkCopy come from, its an extension lib which you wrote an extension lib on top of. Would of just made more sense to say here is how i used SqlBulkCopy lib.Buoyage
should use conn.OpenAsync in async versionShoat
@guiherme Am I correct that the SqlBulkCopy in your code is really the SqlClient.SqlBulkCopy class built in to .net?Welladvised
@SOHODeveloper Yes, you are correct. This code uses Entity to gather information about the table and column names, and then generate a "default" bulk insert that can be used with SqlClient.SqlBulkCopy.Avesta
B
10

Yes, SqlBulkUpdate is indeed the fastest tool for this type of task. I wanted to find "least effort" generic way for me in .NET Core so I ended up using great library from Marc Gravell called FastMember and writing one tiny extension method for entity framework DB context. Works lightning fast:

using System.Collections.Generic;
using System.Linq;
using FastMember;
using Microsoft.Data.SqlClient;
using Microsoft.EntityFrameworkCore;

namespace Services.Extensions
{
    public static class DbContextExtensions
    {
        public static void BulkCopyToServer<T>(this DbContext db, IEnumerable<T> collection)
        {
            var messageEntityType = db.Model.FindEntityType(typeof(T));

            var tableName = messageEntityType.GetSchema() + "." + messageEntityType.GetTableName();
            var tableColumnMappings = messageEntityType.GetProperties()
                .ToDictionary(p => p.PropertyInfo.Name, p => p.GetColumnName());

            using (var connection = new SqlConnection(db.Database.GetDbConnection().ConnectionString))
            using (var bulkCopy = new SqlBulkCopy(connection))
            {
                foreach (var (field, column) in tableColumnMappings)
                {
                    bulkCopy.ColumnMappings.Add(field, column);
                }

                using (var reader = ObjectReader.Create(collection, tableColumnMappings.Keys.ToArray()))
                {
                    bulkCopy.DestinationTableName = tableName;
                    connection.Open();
                    bulkCopy.WriteToServer(reader);
                    connection.Close();
                }
            }
        }
    }
}
Berglund answered 2/11, 2020 at 12:8 Comment(1)
The more effort less generic way would be to follow something like this (which again uses SqlBulkCopy): codingsight.com/…Bartel
T
6

I'm looking for the fastest way of inserting into Entity Framework

There are some third-party libraries supporting Bulk Insert available:

  • Z.EntityFramework.Extensions (Recommended)
  • EFUtilities
  • EntityFramework.BulkInsert

See: Entity Framework Bulk Insert library

Be careful, when choosing a bulk insert library. Only Entity Framework Extensions supports all kind of associations and inheritances and it's the only one still supported.


Disclaimer: I'm the owner of Entity Framework Extensions

This library allows you to perform all bulk operations you need for your scenarios:

  • Bulk SaveChanges
  • Bulk Insert
  • Bulk Delete
  • Bulk Update
  • Bulk Merge

Example

// Easy to use
context.BulkSaveChanges();

// Easy to customize
context.BulkSaveChanges(bulk => bulk.BatchSize = 100);

// Perform Bulk Operations
context.BulkDelete(customers);
context.BulkInsert(customers);
context.BulkUpdate(customers);

// Customize Primary Key
context.BulkMerge(customers, operation => {
   operation.ColumnPrimaryKeyExpression = 
        customer => customer.Code;
});
Toreutics answered 6/1, 2017 at 14:49 Comment(5)
this is a great extension but not free.Baty
This answer is pretty good and EntityFramework.BulkInsert performs a bulk insertion of 15K rows in 1.5 seconds, works pretty nice for an internal process like a Windows Service.Isolationist
Yeah, 600$ for bulk insert. Totaly worth it.Apsis
@Apsis Yeat it's worth it if you use it comercially. I don't see any problem with $600 for something that i don't have to spend hours on building it myself which will cost me a lot more than $600. Yes it costs money but looking at my hourly rate it is money well spend!Outgrow
@JordyvanEijk but this is a price PER YEAR, not an one-time payment...Pomeranian
P
6

One of the fastest ways to save a list you must apply the following code

context.Configuration.AutoDetectChangesEnabled = false;
context.Configuration.ValidateOnSaveEnabled = false;

AutoDetectChangesEnabled = false

Add, AddRange & SaveChanges: Doesn't detect changes.

ValidateOnSaveEnabled = false;

Doesn't detect change tracker

You must add nuget

Install-Package Z.EntityFramework.Extensions

Now you can use the following code

var context = new MyContext();

context.Configuration.AutoDetectChangesEnabled = false;
context.Configuration.ValidateOnSaveEnabled = false;

context.BulkInsert(list);
context.BulkSaveChanges();
Penance answered 2/10, 2018 at 10:5 Comment(3)
can I use Your sample Code For Bulk Update?Eozoic
Z library is not freeDaigle
Thanks @reza-jenabi. It saved meRowdyism
S
5

Try to use a Stored Procedure that will get an XML of the data that you want to insert.

Silvie answered 9/5, 2011 at 17:18 Comment(2)
Passing data as XML is not needed if you don't want to store them as XML. In SQL 2008 you can use table valued parameter.Castorena
i didn't clarify this but i need to also support SQL 2005Graduate
T
5

I have made an generic extension of @Slauma s example above;

public static class DataExtensions
{
    public static DbContext AddToContext<T>(this DbContext context, object entity, int count, int commitCount, bool recreateContext, Func<DbContext> contextCreator)
    {
        context.Set(typeof(T)).Add((T)entity);

        if (count % commitCount == 0)
        {
            context.SaveChanges();
            if (recreateContext)
            {
                context.Dispose();
                context = contextCreator.Invoke();
                context.Configuration.AutoDetectChangesEnabled = false;
            }
        }
        return context;
    }
}

Usage:

public void AddEntities(List<YourEntity> entities)
{
    using (var transactionScope = new TransactionScope())
    {
        DbContext context = new YourContext();
        int count = 0;
        foreach (var entity in entities)
        {
            ++count;
            context = context.AddToContext<TenancyNote>(entity, count, 100, true,
                () => new YourContext());
        }
        context.SaveChanges();
        transactionScope.Complete();
    }
}
Trigon answered 25/11, 2015 at 13:35 Comment(0)
H
4

Here is a performance comparison between using Entity Framework and using SqlBulkCopy class on a realistic example: How to Bulk Insert Complex Objects into SQL Server Database

As others already emphasized, ORMs are not meant to be used in bulk operations. They offer flexibility, separation of concerns and other benefits, but bulk operations (except bulk reading) are not one of them.

Hydrozoan answered 17/7, 2013 at 13:17 Comment(0)
P
4

SqlBulkCopy is super quick

This is my implementation:

// at some point in my calling code, I will call:
var myDataTable = CreateMyDataTable();
myDataTable.Rows.Add(Guid.NewGuid,tableHeaderId,theName,theValue); // e.g. - need this call for each row to insert

var efConnectionString = ConfigurationManager.ConnectionStrings["MyWebConfigEfConnection"].ConnectionString;
var efConnectionStringBuilder = new EntityConnectionStringBuilder(efConnectionString);
var connectionString = efConnectionStringBuilder.ProviderConnectionString;
BulkInsert(connectionString, myDataTable);

private DataTable CreateMyDataTable()
{
    var myDataTable = new DataTable { TableName = "MyTable"};
// this table has an identity column - don't need to specify that
    myDataTable.Columns.Add("MyTableRecordGuid", typeof(Guid));
    myDataTable.Columns.Add("MyTableHeaderId", typeof(int));
    myDataTable.Columns.Add("ColumnName", typeof(string));
    myDataTable.Columns.Add("ColumnValue", typeof(string));
    return myDataTable;
}

private void BulkInsert(string connectionString, DataTable dataTable)
{
    using (var connection = new SqlConnection(connectionString))
    {
        connection.Open();
        SqlTransaction transaction = null;
        try
        {
            transaction = connection.BeginTransaction();

            using (var sqlBulkCopy = new SqlBulkCopy(connection, SqlBulkCopyOptions.TableLock, transaction))
            {
                sqlBulkCopy.DestinationTableName = dataTable.TableName;
                foreach (DataColumn column in dataTable.Columns) {
                    sqlBulkCopy.ColumnMappings.Add(column.ColumnName, column.ColumnName);
                }

                sqlBulkCopy.WriteToServer(dataTable);
            }
            transaction.Commit();
        }
        catch (Exception)
        {
            transaction?.Rollback();
            throw;
        }
    }
}
Priggery answered 24/1, 2019 at 13:2 Comment(0)
I
3

Use SqlBulkCopy:

void BulkInsert(GpsReceiverTrack[] gpsReceiverTracks)
{
    if (gpsReceiverTracks == null)
    {
        throw new ArgumentNullException(nameof(gpsReceiverTracks));
    }

    DataTable dataTable = new DataTable("GpsReceiverTracks");
    dataTable.Columns.Add("ID", typeof(int));
    dataTable.Columns.Add("DownloadedTrackID", typeof(int));
    dataTable.Columns.Add("Time", typeof(TimeSpan));
    dataTable.Columns.Add("Latitude", typeof(double));
    dataTable.Columns.Add("Longitude", typeof(double));
    dataTable.Columns.Add("Altitude", typeof(double));

    for (int i = 0; i < gpsReceiverTracks.Length; i++)
    {
        dataTable.Rows.Add
        (
            new object[]
            {
                    gpsReceiverTracks[i].ID,
                    gpsReceiverTracks[i].DownloadedTrackID,
                    gpsReceiverTracks[i].Time,
                    gpsReceiverTracks[i].Latitude,
                    gpsReceiverTracks[i].Longitude,
                    gpsReceiverTracks[i].Altitude
            }
        );
    }

    string connectionString = (new TeamTrackerEntities()).Database.Connection.ConnectionString;
    using (var connection = new SqlConnection(connectionString))
    {
        connection.Open();
        using (var transaction = connection.BeginTransaction())
        {
            using (var sqlBulkCopy = new SqlBulkCopy(connection, SqlBulkCopyOptions.TableLock, transaction))
            {
                sqlBulkCopy.DestinationTableName = dataTable.TableName;
                foreach (DataColumn column in dataTable.Columns)
                {
                    sqlBulkCopy.ColumnMappings.Add(column.ColumnName, column.ColumnName);
                }

                sqlBulkCopy.WriteToServer(dataTable);
            }
            transaction.Commit();
        }
    }

    return;
}
Indivisible answered 4/6, 2017 at 22:6 Comment(0)
J
2

As per my knowledge there is no BulkInsert in EntityFramework to increase the performance of the huge inserts.

In this scenario you can go with SqlBulkCopy in ADO.net to solve your problem

Jamal answered 9/5, 2011 at 17:20 Comment(5)
I was taking a look at that class, but it seems to be more oriented to table-to-table insertions, isn't?Graduate
Not sure what you mean, it has an overloaded WriteToServer that takes a DataTable.Ceramic
no you can insert from .Net objects to SQL also.What you are looking for?Jamal
A way to insert potentially thousands of records in the database within a TransactionScope blockGraduate
you can use .Net TransactionScope technet.microsoft.com/en-us/library/bb896149.aspxJamal
R
2

All the solutions written here don't help because when you do SaveChanges(), insert statements are sent to database one by one, that's how Entity works.

And if your trip to database and back is 50 ms for instance then time needed for insert is number of records x 50 ms.

You have to use BulkInsert, here is the link: https://efbulkinsert.codeplex.com/

I got insert time reduced from 5-6 minutes to 10-12 seconds by using it.

Render answered 1/10, 2015 at 12:52 Comment(2)
is that link broken please?Lard
@Lard EF6: nuget.org/packages/EntityFramework.BulkInsert-ef6, EF Core: github.com/borisdj/EFCore.BulkExtensionsRender
T
2

Another option is to use SqlBulkTools available from Nuget. It's very easy to use and has some powerful features.

Example:

var bulk = new BulkOperations();
var books = GetBooks();

using (TransactionScope trans = new TransactionScope())
{
    using (SqlConnection conn = new SqlConnection(ConfigurationManager
    .ConnectionStrings["SqlBulkToolsTest"].ConnectionString))
    {
        bulk.Setup<Book>()
            .ForCollection(books)
            .WithTable("Books") 
            .AddAllColumns()
            .BulkInsert()
            .Commit(conn);
    }

    trans.Complete();
}

See the documentation for more examples and advanced usage. Disclaimer: I am the author of this library and any views are of my own opinion.

Treenatreenail answered 13/9, 2016 at 13:15 Comment(1)
This project has been deleted from both NuGet and GitHub.Barnabe
F
2

[NEW SOLUTION FOR POSTGRESQL] Hey, I know it's quite an old post, but I have recently run into similar problem, but we were using Postgresql. I wanted to use effective bulkinsert, what turned out to be pretty difficult. I haven't found any proper free library to do so on this DB. I have only found this helper: https://bytefish.de/blog/postgresql_bulk_insert/ which is also on Nuget. I have written a small mapper, which auto mapped properties the way Entity Framework:

public static PostgreSQLCopyHelper<T> CreateHelper<T>(string schemaName, string tableName)
        {
            var helper = new PostgreSQLCopyHelper<T>("dbo", "\"" + tableName + "\"");
            var properties = typeof(T).GetProperties();
            foreach(var prop in properties)
            {
                var type = prop.PropertyType;
                if (Attribute.IsDefined(prop, typeof(KeyAttribute)) || Attribute.IsDefined(prop, typeof(ForeignKeyAttribute)))
                    continue;
                switch (type)
                {
                    case Type intType when intType == typeof(int) || intType == typeof(int?):
                        {
                            helper = helper.MapInteger("\"" + prop.Name + "\"",  x => (int?)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                            break;
                        }
                    case Type stringType when stringType == typeof(string):
                        {
                            helper = helper.MapText("\"" + prop.Name + "\"", x => (string)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                            break;
                        }
                    case Type dateType when dateType == typeof(DateTime) || dateType == typeof(DateTime?):
                        {
                            helper = helper.MapTimeStamp("\"" + prop.Name + "\"", x => (DateTime?)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                            break;
                        }
                    case Type decimalType when decimalType == typeof(decimal) || decimalType == typeof(decimal?):
                        {
                            helper = helper.MapMoney("\"" + prop.Name + "\"", x => (decimal?)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                            break;
                        }
                    case Type doubleType when doubleType == typeof(double) || doubleType == typeof(double?):
                        {
                            helper = helper.MapDouble("\"" + prop.Name + "\"", x => (double?)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                            break;
                        }
                    case Type floatType when floatType == typeof(float) || floatType == typeof(float?):
                        {
                            helper = helper.MapReal("\"" + prop.Name + "\"", x => (float?)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                            break;
                        }
                    case Type guidType when guidType == typeof(Guid):
                        {
                            helper = helper.MapUUID("\"" + prop.Name + "\"", x => (Guid)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                            break;
                        }
                }
            }
            return helper;
        }

I use it the following way (I had entity named Undertaking):

var undertakingHelper = BulkMapper.CreateHelper<Model.Undertaking>("dbo", nameof(Model.Undertaking));
undertakingHelper.SaveAll(transaction.UnderlyingTransaction.Connection as Npgsql.NpgsqlConnection, undertakingsToAdd));

I showed an example with transaction, but it can also be done with normal connection retrieved from context. undertakingsToAdd is enumerable of normal entity records, which I want to bulkInsert into DB.

This solution, to which I've got after few hours of research and trying, is as you could expect much faster and finally easy to use and free! I really advice you to use this solution, not only for the reasons mentioned above, but also because it's the only one with which I had no problems with Postgresql itself, many other solutions work flawlessly for example with SqlServer.

Ferricyanide answered 5/9, 2017 at 21:3 Comment(0)
P
2

Taking several notes, this is my implementation with improvements mine and from other answers and comments.

Improvements:

  • Getting the SQL connection string from my Entity

  • Using SQLBulk just in some parts, the rest only Entity Framework

  • Using the same Datetable column names that uses the SQL Database without need of mapping each column

  • Using the same Datatable name that uses SQL Datatable

    public  void InsertBulkDatatable(DataTable dataTable)
         {
             EntityConnectionStringBuilder entityBuilder =  new EntityConnectionStringBuilder(ConfigurationManager.ConnectionStrings["MyDbContextConnectionName"].ConnectionString);
             string cs = entityBuilder.ProviderConnectionString;
             using (var connection = new SqlConnection(cs))
             {
                 SqlTransaction transaction = null;
                 connection.Open();
                 try
                 {
                     transaction = connection.BeginTransaction();
                     using (var sqlBulkCopy = new SqlBulkCopy(connection, SqlBulkCopyOptions.TableLock, transaction))
                     {
                         sqlBulkCopy.DestinationTableName = dataTable.TableName; //Uses the SQL datatable to name the datatable in c#
                         //Maping Columns
                         foreach (DataColumn column in dataTable.Columns) {
                             sqlBulkCopy.ColumnMappings.Add(column.ColumnName, column.ColumnName);
    
                         }
    
    
                         sqlBulkCopy.WriteToServer(dataTable);
                     }
                     transaction.Commit();
                 }
                 catch (Exception)
                 {
                     transaction.Rollback();
                 }
    
             }
         }
    
Priestess answered 14/10, 2021 at 16:37 Comment(0)
F
1

The secret is to insert into an identical blank staging table. Inserts are lightening quick. Then run a single insert from that into your main large table. Then truncate the staging table ready for the next batch.

ie.

insert into some_staging_table using Entity Framework.

-- Single insert into main table (this could be a tiny stored proc call)
insert into some_main_already_large_table (columns...)
   select (columns...) from some_staging_table
truncate table some_staging_table
Franzoni answered 9/5, 2011 at 17:32 Comment(2)
Using EF, add all your records to an empty staging table. Then use SQL to insert into the main (large and slow) table in a single SQL instruction. Then empty your staging table. It's a very fast way of inserting a lot of data into an already large table.Franzoni
When you say using EF, add the records to the staging table, did you actually try this with EF? Since EF issues a separate call to the database with each insert, I suspect you are going to see the same perf hit that the OP is trying to avoid. How does the staging table avoid this issue?Mote
G
1

Have you ever tried to insert through a background worker or task?

In my case, im inserting 7760 registers, distributed in 182 different tables with foreign key relationships ( by NavigationProperties).

Without the task, it took 2 minutes and a half. Within a Task ( Task.Factory.StartNew(...) ), it took 15 seconds.

Im only doing the SaveChanges() after adding all the entities to the context. (to ensure data integrity)

Gonfalonier answered 27/5, 2013 at 20:19 Comment(5)
I am pretty sure that the context isn't thread safe. Do you have tests to ensure that all the entities were saved?Frasquito
I know the entire entity framework isnt thread safe at all, but im just adding the objects to the context and saving at the end... Its working perfectly here.Gonfalonier
So, You are calling DbContext.SaveChanges() in main thread, but adding entities to context is performed in background thread, right?Doro
Yes, add data inside the threads; wait for all to finish; and Save Changes in main threadGonfalonier
Although I think this way is dangerous and prone to mistakes, I find it very interesting.Fontaine
R
0

You may use Bulk package library. Bulk Insert 1.0.0 version is used in projects having Entity framework >=6.0.0 .

More description can be found here- Bulkoperation source code

Raneeraney answered 17/4, 2017 at 13:54 Comment(0)
T
0

TL;DR I know it is an old post, but I have implemented a solution starting from one of those proposed by extending it and solving some problems of this; moreover I have also read the other solutions presented and compared to these it seems to me to propose a solution that is much more suited to the requests formulated in the original question.

In this solution I extend Slauma's approach which I would say is perfect for the case proposed in the original question, and that is to use Entity Framework and Transaction Scope for an expensive write operation on the db.

In Slauma's solution - which incidentally was a draft and was only used to get an idea of ​​the speed of EF with a strategy to implement bulk-insert - there were problems due to:

  1. the timeout of the transaction (by default 1 minute extendable via code to max 10 minutes);
  2. the duplication of the first block of data with a width equal to the size of the commit used at the end of the transaction (this problem is quite weird and circumvented by means of a workaround).

I also extended the case study presented by Slauma by reporting an example that includes the contextual insertion of several dependent entities.

The performances that I have been able to verify have been of 10K rec/min inserting in the db a block of 200K wide records approximately 1KB each. The speed was constant, there was no degradation in performance and the test took about 20 minutes to run successfully.

The solution in detail

the method that presides over the bulk-insert operation inserted in an example repository class:

abstract class SomeRepository { 

    protected MyDbContext myDbContextRef;

    public void ImportData<TChild, TFather>(List<TChild> entities, TFather entityFather)
            where TChild : class, IEntityChild
            where TFather : class, IEntityFather
    {

        using (var scope = MyDbContext.CreateTransactionScope())
        {

            MyDbContext context = null;
            try
            {
                context = new MyDbContext(myDbContextRef.ConnectionString);

                context.Configuration.AutoDetectChangesEnabled = false;

                entityFather.BulkInsertResult = false;
                var fileEntity = context.Set<TFather>().Add(entityFather);
                context.SaveChanges();

                int count = 0;

                //avoids an issue with recreating context: EF duplicates the first commit block of data at the end of transaction!!
                context = MyDbContext.AddToContext<TChild>(context, null, 0, 1, true);

                foreach (var entityToInsert in entities)
                {
                    ++count;
                    entityToInsert.EntityFatherRefId = fileEntity.Id;
                    context = MyDbContext.AddToContext<TChild>(context, entityToInsert, count, 100, true);
                }

                entityFather.BulkInsertResult = true;
                context.Set<TFather>().Add(fileEntity);
                context.Entry<TFather>(fileEntity).State = EntityState.Modified;

                context.SaveChanges();
            }
            finally
            {
                if (context != null)
                    context.Dispose();
            }

            scope.Complete();
        }

    }

}

interfaces used for example purposes only:

public interface IEntityChild {

    //some properties ...

    int EntityFatherRefId { get; set; }

}

public interface IEntityFather {

    int Id { get; set; }
    bool BulkInsertResult { get; set; }
}

db context where I implemented the various elements of the solution as static methods:

public class MyDbContext : DbContext
{

    public string ConnectionString { get; set; }


    public MyDbContext(string nameOrConnectionString)
    : base(nameOrConnectionString)
    {
        Database.SetInitializer<MyDbContext>(null);
        ConnectionString = Database.Connection.ConnectionString;
    }


    /// <summary>
    /// Creates a TransactionScope raising timeout transaction to 30 minutes
    /// </summary>
    /// <param name="_isolationLevel"></param>
    /// <param name="timeout"></param>
    /// <remarks>
    /// It is possible to set isolation-level and timeout to different values. Pay close attention managing these 2 transactions working parameters.
    /// <para>Default TransactionScope values for isolation-level and timeout are the following:</para>
    /// <para>Default isolation-level is "Serializable"</para>
    /// <para>Default timeout ranges between 1 minute (default value if not specified a timeout) to max 10 minute (if not changed by code or updating max-timeout machine.config value)</para>
    /// </remarks>
    public static TransactionScope CreateTransactionScope(IsolationLevel _isolationLevel = IsolationLevel.Serializable, TimeSpan? timeout = null)
    {
        SetTransactionManagerField("_cachedMaxTimeout", true);
        SetTransactionManagerField("_maximumTimeout", timeout ?? TimeSpan.FromMinutes(30));

        var transactionOptions = new TransactionOptions();
        transactionOptions.IsolationLevel = _isolationLevel;
        transactionOptions.Timeout = TransactionManager.MaximumTimeout;
        return new TransactionScope(TransactionScopeOption.Required, transactionOptions);
    }

    private static void SetTransactionManagerField(string fieldName, object value)
    {
        typeof(TransactionManager).GetField(fieldName, BindingFlags.NonPublic | BindingFlags.Static).SetValue(null, value);
    }


    /// <summary>
    /// Adds a generic entity to a given context allowing commit on large block of data and improving performance to support db bulk-insert operations based on Entity Framework
    /// </summary>
    /// <typeparam name="T"></typeparam>
    /// <param name="context"></param>
    /// <param name="entity"></param>
    /// <param name="count"></param>
    /// <param name="commitCount">defines the block of data size</param>
    /// <param name="recreateContext"></param>
    /// <returns></returns>
    public static MyDbContext AddToContext<T>(MyDbContext context, T entity, int count, int commitCount, bool recreateContext) where T : class
    {
        if (entity != null)
            context.Set<T>().Add(entity);

        if (count % commitCount == 0)
        {
            context.SaveChanges();
            if (recreateContext)
            {
                var contextConnectionString = context.ConnectionString;
                context.Dispose();
                context = new MyDbContext(contextConnectionString);
                context.Configuration.AutoDetectChangesEnabled = false;
            }
        }

        return context;
    }
}
Trapezium answered 28/8, 2020 at 12:8 Comment(0)
I
-1

Configuration.LazyLoadingEnabled = false; Configuration.ProxyCreationEnabled = false;

these are too effect to speed without AutoDetectChangesEnabled = false; and i advise to use different table header from dbo. generally i use like nop,sop,tbl etc..

Illampu answered 23/12, 2020 at 23:37 Comment(0)
M
-2

But, for more than (+4000) inserts i recommend to use stored procedure. attached the time elapsed. I did inserted it 11.788 rows in 20"enter image description here

thats it code

 public void InsertDataBase(MyEntity entity)
    {
        repository.Database.ExecuteSqlCommand("sp_mystored " +
                "@param1, @param2"
                 new SqlParameter("@param1", entity.property1),
                 new SqlParameter("@param2", entity.property2));
    }
Mohave answered 20/11, 2017 at 0:54 Comment(0)
C
-6

Use this technique to increase the speed of inserting records in Entity Framework. Here I use a simple stored procedure to insert the records. And to execute this stored procedure I use .FromSql() method of Entity Framework which executes Raw SQL.

The stored procedure code:

CREATE PROCEDURE TestProc
@FirstParam VARCHAR(50),
@SecondParam VARCHAR(50)

AS
  Insert into SomeTable(Name, Address) values(@FirstParam, @SecondParam) 
GO

Next, loop through all your 4000 records and add the Entity Framework code which executes the stored

procedure onces every 100th loop.

For this I create a string query to execute this procedure, keep on appending to it every sets of record.

Then check it the loop is running in the multiples of 100 and in that case execute it using .FromSql().

So for 4000 records I only have to execute the procedure for only 4000/100 = 40 times.

Check the below code:

string execQuery = "";
var context = new MyContext();
for (int i = 0; i < 4000; i++)
{
    execQuery += "EXEC TestProc @FirstParam = 'First'" + i + "'', @SecondParam = 'Second'" + i + "''";

    if (i % 100 == 0)
    {
        context.Student.FromSql(execQuery);
        execQuery = "";
    }
}
Claudianus answered 11/3, 2019 at 11:12 Comment(3)
This might be efficient but equivalent to NOT using entity framework. The OP question was how to maximize efficiency in the context of Entity FrameworkCandlelight
I have historically used this approach and it proved to be very efficient. Question to the community, are there any drawbacks to using this approach?Morose
Could be useful in a pinch. (e.g. One of our Production sites suddently ramped up hits recently, causing a scramble to increase throughput, both short term and long term).Stratocracy

© 2022 - 2024 — McMap. All rights reserved.