C# Entity Framework: Bulk Extensions Input Memory Issue
Asked Answered
C

2

1

I am currently using EF Extensions. One thing I don't understand, "its supposed to help with performance"

however placing a million+ records into List variable, is a Memory Issue itself. So If wanting to update million records, without holding everything in memory, how can this be done efficiently?

Should we use a for loop, and update in batches say 10,000? Does EFExtensions BulkUpdate have any native functionality to support this?

Example:

var productUpdate = _dbContext.Set<Product>()
    .Where(x => x.ProductType == 'Electronics');  // this creates IQueryable

await productUpdate.ForEachAsync(c => c.ProductBrand = 'ABC Company');

_dbContext.BulkUpdateAsync(productUpdate.ToList());

Resource:

https://entityframework-extensions.net/bulk-update

Cryptanalysis answered 17/8, 2020 at 22:16 Comment(3)
1 million records is a lot. indeed you can try to do batches. if you do a "ToList()" on your query, you materialize it, meaning load everything into memory. What are you trying to update more precisely? Is this helping: #44195377 ?Enforce
now isnt there a data type which supports lazy loading? Maybe ICollection does?, surprised EF Bulk Extensions does not have a native library function to support this batchingCryptanalysis
hi @Enforce we are trying to update a single field in the database, for millions of records, if they match a certain where criteria,Cryptanalysis
E
0

I found the "proper" EF Extensions way to do a bulk update with a query-like condition:

var productUpdate = _dbContext.Set<Product>()
    .Where(x => x.ProductType == 'Electronics')
    .UpdateFromQuery( x => new Product { ProductBrand = "ABC Company" });

This should result in a proper SQL UPDATE ... SET ... WHERE, without the need to load entities first, as per the documentation:

Why UpdateFromQuery is faster than SaveChanges, BulkSaveChanges, and BulkUpdate?

UpdateFromQuery executes a statement directly in SQL such as UPDATE [TableName] SET [SetColumnsAndValues] WHERE [Key].

Other operations normally require one or multiple database round-trips which makes the performance slower.

You can check the working syntax on this dotnet fiddle example, adapted from their example of BulkUpdate.

Other considerations

  • No mention of batch operations for this, unfortunately.

  • Before doing a big update like this, it might be worth considering deactivating indexes you may have on this column, and rebuild them afterward. This is especially useful if you have many of them.

  • Careful about the condition in the Where, if it can't be translated as SQL by EF, then it will be done client side, meaning the "usual" terrible roundtrip "Load - change in memory - update"

Enforce answered 18/8, 2020 at 0:27 Comment(8)
thanks this is helpful, does this also update the SQL table in batches without locking the whole table? #35903875Cryptanalysis
also, I only need to update 1 column, wondering if new Product { will cover it, thanksCryptanalysis
About batches, unfortunately there is no mention of them in the documentation...Enforce
About the column, I think (and hope) that this method actually analyzes the expression to understand that only one column is to be updated, hence saving a lot of memory. As per the dotnet fiddle example, I suppose this is the case, but I don't have a real case with me to test this and check SQL generated. Also, they state it in their doc.Enforce
@Artportraitdesign1 I missed the point of your first comment. Yes, if you're willing to use direct SQL (which is ultimately a problem solver in these cases), the linked Q&A seems interesting. If your condition is simple enough, and your db robust enough, and you take care of details like indexes, you might get away with UpdateFromQuery with a simpler and clearer code.Enforce
I'm wondering if you could simply do batches with UpdateWithQuery by using .Take(10_000) after the Where and looping enough times... to be testedEnforce
I am upvoting this answer, although I may use @Retrieval answer for simplicity, this is greatCryptanalysis
opening another question here, #63462490Cryptanalysis
R
2

This is actually something that EF is not made for. EF's database interactions start from the record object, and flow from there. EF cannot generate a partial UPDATE (i.e. not overwriting everything) if the entity wasn't change tracked (and therefore loaded), and similarly it cannot DELETE records based on a condition instead of a key.

There is no EF equivalent (without loading all of those records) for conditional update/delete logic such as

UPDATE People
SET FirstName = 'Bob'
WHERE FirstName = 'Robert'

or

DELETE FROM People
WHERE FirstName = 'Robert'

Doing this using the EF approach will require you to load all of these entities just to send them back (with an update or delete) to the database, and that's a waste of bandwidth and performance as you've already found.

The best solution I've found here is to bypass EF's LINQ-friendly methods and instead execute the raw SQL yourself. This can still be done using an EF context.

using (var ctx = new MyContext())
{
    string updateCommand = "UPDATE People SET FirstName = 'Bob' WHERE FirstName = 'Robert'";
    int noOfRowsUpdated = ctx.Database.ExecuteSqlCommand(updateCommand);

    string deleteCommand = "DELETE FROM People WHERE FirstName = 'Robert'";
    int noOfRowsDeleted = ctx.Database.ExecuteSqlCommand(deleteCommand);
}

More information here. Of course don't forget to protect against SQL injection where relevant.

The specific syntax to run raw SQL may vary per version of EF/EF Core but as far as I'm aware all versions allow you to execute raw SQL.


I can't comment on the performance of EF Extensions or BulkUpdate specifically, and I'm not going to buy it from them.

Based on their documentation, they don't seem to have the methods with the right signatures to allow for conditional update/delete logic.

  • BulkUpdate doesn't seem to allow you to input the logical condition (the WHERE in your UPDATE command) that would allow you to optimize this.
  • BulkDelete still has a BatchSize setting, which suggests that they are still handling the records one at a time (well, per batch I guess), and not using a single DELETE query with a condition (WHERE clause).

Based on your intended code in the question, EF Extensions isn't really giving you what you need. It's more performant and cheaper to simply execute raw SQL on the database, as this bypasses EF's need to load its entities.

Update
I might stand corrected, there is some support for conditional update logic, as seen here. However, it is unclear to me while the example still loads everything in memory and what the purpose of that conditional WHERE logic then is if you've already loaded it all in memory (why not use in-memory LINQ then?)

However, even if this works without loading the entities, it's still:

  • more limited (only equality checks are allowed, compared to SQL allowing any boolean condition that is valid SQL),
  • relatively complex (I don't like their syntax, maybe that's subjective)
  • and more costly (still a paid library)

compared to rolling your own raw SQL query. I would still suggest rolling your own raw SQL here, but that's just my opinion.

Retrieval answered 17/8, 2020 at 23:43 Comment(6)
yes bulkupdate extensions should be able to help with OP case. But your approach is valid and could be acceptable, it depends on OP's actual settings and perf requirements of course.Enforce
hi @Enforce he also brought up some points regarding "Doing this using the EF approach will require you to load all of these entities just to send them back (with an update or delete) to the database, and that's a waste of bandwidth and performance as you've already found."Cryptanalysis
@Artportraitdesign1: Unsure if that's what you meant or not, but note that what you quoted talks about EF itself, not EF Extensions (which I have no personal experience with). The EF extensions documentation seems to suggest that it can do so by avoiding loading the entities, but their own example then loads all the entities anyway. Maybe that's an issue in their documentation, or maybe it's not a feature of their library. I can't be sure.Retrieval
@Art yes, I agree with this part. EF logic is "load then update", and here you don't need this.Enforce
@Retrieval so it seems EF Extensions do have a proper SQL way to make an update without the dreaded EF round-trip after all (see my answer).Enforce
hi @Flater, I am upvoting your answer, and even though Im using Raw Sql which I believe is better/optimal route : ) , accepting Pac0, since he accepted premise of original question, both answers are greatCryptanalysis
E
0

I found the "proper" EF Extensions way to do a bulk update with a query-like condition:

var productUpdate = _dbContext.Set<Product>()
    .Where(x => x.ProductType == 'Electronics')
    .UpdateFromQuery( x => new Product { ProductBrand = "ABC Company" });

This should result in a proper SQL UPDATE ... SET ... WHERE, without the need to load entities first, as per the documentation:

Why UpdateFromQuery is faster than SaveChanges, BulkSaveChanges, and BulkUpdate?

UpdateFromQuery executes a statement directly in SQL such as UPDATE [TableName] SET [SetColumnsAndValues] WHERE [Key].

Other operations normally require one or multiple database round-trips which makes the performance slower.

You can check the working syntax on this dotnet fiddle example, adapted from their example of BulkUpdate.

Other considerations

  • No mention of batch operations for this, unfortunately.

  • Before doing a big update like this, it might be worth considering deactivating indexes you may have on this column, and rebuild them afterward. This is especially useful if you have many of them.

  • Careful about the condition in the Where, if it can't be translated as SQL by EF, then it will be done client side, meaning the "usual" terrible roundtrip "Load - change in memory - update"

Enforce answered 18/8, 2020 at 0:27 Comment(8)
thanks this is helpful, does this also update the SQL table in batches without locking the whole table? #35903875Cryptanalysis
also, I only need to update 1 column, wondering if new Product { will cover it, thanksCryptanalysis
About batches, unfortunately there is no mention of them in the documentation...Enforce
About the column, I think (and hope) that this method actually analyzes the expression to understand that only one column is to be updated, hence saving a lot of memory. As per the dotnet fiddle example, I suppose this is the case, but I don't have a real case with me to test this and check SQL generated. Also, they state it in their doc.Enforce
@Artportraitdesign1 I missed the point of your first comment. Yes, if you're willing to use direct SQL (which is ultimately a problem solver in these cases), the linked Q&A seems interesting. If your condition is simple enough, and your db robust enough, and you take care of details like indexes, you might get away with UpdateFromQuery with a simpler and clearer code.Enforce
I'm wondering if you could simply do batches with UpdateWithQuery by using .Take(10_000) after the Where and looping enough times... to be testedEnforce
I am upvoting this answer, although I may use @Retrieval answer for simplicity, this is greatCryptanalysis
opening another question here, #63462490Cryptanalysis

© 2022 - 2024 — McMap. All rights reserved.