EF Core Slow Bulk Insert (~80k rows)
Asked Answered
P

3

6

I have a Save object which has several collections associated. Total size of the objects is as follows:

enter image description here

The relations between objects can be infered from this mapping, and seem correctly represented in the database. Also querying works just fine.

modelBuilder.Entity<Save>().HasKey(c => c.SaveId).HasAnnotation("DatabaseGenerated",DatabaseGeneratedOption.Identity);
modelBuilder.Entity<Save>().HasMany(c => c.Families).WithOne(x => x.Save).HasForeignKey(x => x.SaveId);
modelBuilder.Entity<Save>().HasMany(c => c.Countries).WithOne(x => x.Save).HasForeignKey(x => x.SaveId);
modelBuilder.Entity<Save>().HasMany(c => c.Provinces).WithOne(x => x.Save).HasForeignKey(x => x.SaveId);
modelBuilder.Entity<Save>().HasMany(c => c.Pops).WithOne(x => x.Save).HasForeignKey(x => x.SaveId);
modelBuilder.Entity<Country>().HasOne(c => c.Save);
modelBuilder.Entity<Country>().HasMany(c => c.Technologies).WithOne(x => x.Country).HasForeignKey(x => new {x.SaveId, x.CountryId});
modelBuilder.Entity<Country>().HasMany(c => c.Players).WithOne(x => x.Country).HasForeignKey(x => new {x.SaveId, x.CountryId});
modelBuilder.Entity<Country>().HasMany(c => c.Families).WithOne(x => x.Country).HasForeignKey(x => new {x.SaveId, x.OwnerId});
modelBuilder.Entity<Country>().HasMany(c => c.Provinces).WithOne(x => x.Owner);
modelBuilder.Entity<Country>().HasKey(c => new { c.SaveId, c.CountryId });
modelBuilder.Entity<Family>().HasKey(c => new { c.SaveId, c.FamilyId });
modelBuilder.Entity<Family>().HasOne(c => c.Save);
modelBuilder.Entity<CountryPlayer>().HasKey(c => new { c.SaveId, c.CountryId, c.PlayerName });
modelBuilder.Entity<CountryPlayer>().HasOne(c => c.Country);
modelBuilder.Entity<CountryPlayer>().Property(c => c.PlayerName).HasMaxLength(100);
modelBuilder.Entity<CountryTechnology>().HasKey(c => new { c.SaveId, c.CountryId, c.Type });
modelBuilder.Entity<CountryTechnology>().HasOne(c => c.Country);
modelBuilder.Entity<Province>().HasKey(c => new { c.SaveId, c.ProvinceId });
modelBuilder.Entity<Province>().HasMany(c => c.Pops).WithOne(x => x.Province);
modelBuilder.Entity<Province>().HasOne(c => c.Save);
modelBuilder.Entity<Population>().HasKey(c => new { c.SaveId, c.PopId });
modelBuilder.Entity<Population>().HasOne(c => c.Province);
modelBuilder.Entity<Population>().HasOne(c => c.Save);

I parse the entire save from a file so I can't add all the collections one by one. After the parsing I have a Savewith all its associated collections, adding up to 80k objects, none of which are present in the database.

Then, when I call dbContext.Add(save)it takes around 44 seconds to process, with RAM usage going up from 100mb to around 700mb.

Then, when I call dbContext.SaveChanges() (I tried also the regular BulkSaveChanges() method from EF Extensions with no significant difference) it takes an additional 60s, with RAM usage going up to 1,3Gb.

What is going on here? Why so long and so much memory usage? The actual uploading to the database only takes about the last 5 seconds.

PS: I also tried disabling change detection with no effect.

PS2: Actual usage and full code as requested in comments:

public class HomeController : Controller
{
    private readonly ImperatorContext _db;

    public HomeController(ImperatorContext db)
    {
        _db = db;
    }

    [HttpPost]
    [RequestSizeLimit(200000000)]
    public async Task<IActionResult> UploadSave(List<IFormFile> files)
    {
        [...]
        await using (var stream = new FileStream(filePath, FileMode.Open))
        {
            var save = ParadoxParser.Parse(stream, new SaveParser());
            if (_db.Saves.Any(s => s.SaveKey == save.SaveKey))
            {
                 response = "The save you uploaded already exists in the database.";
            }
            else
            {
                 _db.Saves.Add(save);
            }
            _db.BulkSaveChanges();
        }
        [...]
    }

}
Painful answered 28/1, 2020 at 17:27 Comment(14)
Please show the actual code, and show how are you using BulkSaveChanges()Cancellate
#5943894Fairspoken
You shouldn't try to save 80k records at once. Try just saving say 1000 at a time.Fairspoken
@Cancellate added, though I don't think it adds any further info.Painful
@Fairspoken How can I do that if all the dependant objects are inside the save object, I can't loop though it as there is only one object at the top.Painful
Welcome to the amazing world of ORM. Data access code is too important and performance critical to be designed by a machine.Cassiodorus
Entity Framework isn't the best option for bulk operations. Maybe, in this case, try creating a stored procedure and maybe passing the data in as json? I don't know MariaDB. You can still call the sproc via EF.Diaeresis
@Cassiodorus That is true! However, sometimes we just don't have the luxury of writing all that code :( He's going to have to make some tweeks to make it acceptable.Witting
Put your "Saves" collection in a dictionary or hashset. To search with 'Any' is quiet expensive, this is a simple linear search. You have 1 file per item ? Already just opening the file must be much slower, than executing the insert. And if you call BulkSaveChanges for each single item - that is not a "BulkInsert".Trapezoid
@holger there is only one save. The Any search is extremely fast as it is converted by ef to an exists select in sql. The problem is not therePainful
@Pinx So you really Parse only one File ? NOt a List of Files ?OK. The documentation says clearly: When should I use BulkSaveChanges over SaveChanges? Whenever you have more than one entity to save, You save only one Entity. Related Entities don't count. EF is not made for something like a complete database copy in one operation. You can be happy it works at all. A more specialized implementation, that does not involve building up an object hierarchiy(parsing) just to keep it for some seconds in memory, just to save it to SQL, might be more appropriate. Streaming conversion would be the bestTrapezoid
@MattM there is 3rd party library for Entity Framework Core to have better performance bulk insert, around 500x faster. Regular EFCore's AddRange takes 4 seconds to save 10k records. 3rdparty library bulk insert takes only 0.003 sec to save 10k records. Can check my answer.Airlie
I get why you want to save the complete object graph in one go, but as you say, it doesn't meet your performance expectations, so you will have to find another way. How about loading the file into a 'different' object graph and then extracting the different 'tables' individually and then storing them into the database, that will mean you can do partial inserts at 1000 at a time.Fairspoken
@Fairspoken Yes that is just what I thought and tried. Using ``BulkInsert` individually in the 5 main collections, the time has gone down to 28 seconds in total. It still feels too much for me considering the amount of data, but it's something.Painful
A
2

Download EFCore.BulkExtensions from nugets

Remove "_db.BulkSaveChanges();" and replace "_db.Saves.Add(save);" with this code

_db.Saves.BulkInsert(save);
Airlie answered 29/1, 2020 at 5:10 Comment(0)
U
1

I would suggest you take a look at N.EntityFrameworkCore.Extension. It is a bulk extension framework for EFCore 6.0.8+

Install-Package N.EntityFrameworkCore.Extensions

https://www.nuget.org/packages/N.EntityFrameworkCore.Extensions

Once you install the nuget package you can simply use BulkInsert() method directly on the DbContext instance. It supports BulkDelete, BulkInsert, BulkMerge and more.

BulkDelete()

var dbcontext = new MyDbContext();  
var orders = dbcontext.Orders.Where(o => o.TotalPrice < 5.35M);  
dbcontext.BulkDelete(orders);

BulkInsert()

var dbcontext = new MyDbContext();  
var orders = new List<Order>();  
for(int i=0; i<10000; i++)  
{  
   orders.Add(new Order { OrderDate = DateTime.UtcNow, TotalPrice = 2.99 });  
}  
dbcontext.BulkInsert(orders);  
Unutterable answered 23/8, 2022 at 17:28 Comment(0)
W
0

EDIT: 1. Make sure it's not the DB that's the issue.

Execute your own command to see how fast it runs.

  1. Keep the active Context Graph small by using a new context for each Unit of Work, also try to Turn off AutoDetechChangesEnabled

3.batch a number of commands together

Here is a good article on Entity Framework and slow bulk INSERTs

Witting answered 28/1, 2020 at 17:41 Comment(6)
It is not SQL (its MariaDB by the way) because the slow part begins on the Add method where the database isn't yet contactedPainful
Sorry I missed that part and just automatically assumed that it was sql. The same principle still applies because it can still help with performance. Can you show us all the relevant code? That would be more helpful.Witting
i added the full code, hope it helps you understand the issue ;)Painful
Thanks, just another couple of questions. What does BulkSaveChanges do? I want to try to find a solution with the minimal amount of round trips to the db. Have you tried something like this? entityframeworkcore.com/saving-data-bulk-insert? it does just that for you and is easy to use.Witting
BulkSaveChanges is from the same library you linked me. However I will test bulk insert as well.Painful
Sounds good, please let me know how it goes. If it's still slow we should probably find another solution outside of the ORM. Probably like a stored procedure that was mentioned in the comments.Witting

© 2022 - 2024 — McMap. All rights reserved.