Entity Framework, Bulk Inserts, and Maintaining Relationships
Asked Answered
F

2

7

I have what would seem to be a common problem yet I cannot figure out how to achieve the desired outcome. I have a nested entity with navigation properties defined on it as seen in the following diagram.

enter image description here

The map points collection can potentially be quite large for a given MapLine and there can be quite a large number of MapLines for a MapLayer.

The question here is what is the best approach for getting a MapLayer object inserted into the database using Entity Framework and still maintain the relationships that are defined by the navigation properties?

A standard Entity Framework implementation

dbContext.MapLayers.Add(mapLayer);
dbContext.SaveChanges();

causes a large memory spike and pretty poor return times.

I have tried implementing the EntityFramework.BulkInsert package but it does not honor the relationships of the objects.

This seems like it would be a problem that someone has run into before but I cant seem to find any resources that explain how to accomplish this task.

Update

I have tried to implement the suggestion provided by Richard but I am not understanding how I would go about this for a nested entity such as the one I have described. I am running under the assumption that I need to insert the MapLayer object, then the MapLines, then the MapPoints to honor the PF/FK relationship in the database. I am currently trying the following code but this does not appear to be correct.

dbContext.MapLayers.Add(mapLayer);
dbContext.SaveChanges();

List<MapLine> mapLines = new List<MapLine>();
List<MapPoint> mapPoints = new List<MapPoint>();
foreach (MapLine mapLine in mapLayer.MapLines)
{
    //Update the mapPoints.MapLine properties to reflect the current line object
    var updatedLines = mapLine.MapPoints.Select(x => { x.MapLine = mapLine; return x; }).ToList();

    mapLines.AddRange(updatedLines);
}

using (TransactionScope scope = new TransactionScope())
{
    MyDbContext context = null;
    try
    {
        context = new MyDbContext();
        context.Configuration.AutoDetectChangesEnabled = false;

        int count = 0;
        foreach (var entityToInsert in mapLines)
        {
            ++count;
            context = AddToContext(context, entityToInsert, count, 100, true);
        }

        context.SaveChanges();
    }
    finally
    {
        if (context != null)
            context.Dispose();
    }

    scope.Complete();
}

Update 2

After having tried multiple different ways to achieve this I finally gave up and just inserted the MapLayer as an entity and stored the MapLines => MapPoints relationship as the raw Json string in a byte array on the MapLayer entity (as I am not querying against those structures this works for me).

As the saying goes "It aint pretty, but it works".

I did have some success with the BulkInsert package and managing the relationships outside of EF, but again ran into a memory problem when trying to use EF to pull the data back into the system. It seems that currently, EF is not capable of handling large datasets and complex relationships efficiently.

Furlani answered 12/1, 2015 at 20:47 Comment(1)
Can u explain what u are doing with the data afterwords? I am interested, if it is not better to just use a simple binary Format for your data.Girder
J
16

I had bad experience with huge context save. All those recommendations about saving in iterations by 100 rows, by 1000 rows, then disposing context or clearing list and detaching objects, assigning null to everything etc etc - it is all bullshit. We had requirements to insert daily millions of rows in many tables. Definitely one should not use entity in these conditions. You will be fighting with memory leaks and decrease in insertion speed when iterations proceed.

Our first improvement was creating stored procedures and adding them to model. It is 100 times faster then Context.SaveChanges(), and there is no leaks, no decrease in speed over time.

But it was not sufficient for us and we decided to use SqlBulkCopy. It is super fast. 1000 times faster then using stored procedures.

So my suggestion will be: if you have many rows to insert but count is under something like 50000 rows, use stored procedures, imported in model; if you have hundreds of thousands of rows, go and try SqlBulkCopy.

Here is some code:

EntityConnection ec = (EntityConnection)Context.Connection;
SqlConnection sc = (SqlConnection)ec.StoreConnection;

var copy = new SqlBulkCopy(sc, SqlBulkCopyOptions.CheckConstraints | SqlBulkCopyOptions.Default , null);

copy.DestinationTableName = "TableName";
copy.ColumnMappings.Add("SourceColumn", "DBColumn");
copy.WriteToServer(dataTable);
copy.Close();

If you use DbTransaction with context, you can manage to bulk insert using that transaction as well, but it needs some hacks.

Jethro answered 13/1, 2015 at 11:53 Comment(4)
"I had bad experience with huge context save. All those recommendations about saving in iterations by 100 rows, by 1000 rows, then disposing context or clearing list and detaching objects, assigning null to everything etc etc - it is all bullshit." - read till this point and +1 alreadyLudwigshafen
but what about the relationship issue ? how would he solve that ?Ludwigshafen
@eranotzap, If you mean relashinship while bulk inserting, we just added 2 additional columns to the parent table and filling it in the code. 1 is for portion say PortionID, other is for relation say RelationID. After bulk insert we select the data by portion and we select ID and RelationID. So I have now relations and assign appropriate IDs to child records comparing by RelationID and assigning ID from db. Then I do another bulkinsert for children.Jethro
I also did that.. Used a complextype. though now i have trouble translating the entity back when queries as a set from the db via stored procedure.Ludwigshafen
P
6

Bulk Insert is not the only way of efficiently adding data using Entity Framework - a number of alternatives are detailed in this answer. You can use the optimisations suggested there (disabling change tracking) then you can just add things as normal.

Note that as you are adding many items at once, you'll need to recreate your context fairly frequently to stop the memory leak and slowdown that you'll get.

Parrie answered 12/1, 2015 at 22:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.