Why is EF4 Code First so slow when storing objects?
Asked Answered
A

4

12

I'm currently doing some research on usage of db4o a storage for my web application. I'm quite happy how easy db4o works. So when I read about the Code First approach I kinda liked is, because the way of working with EF4 Code First is quite similar to working with db4o: create your domain objects (POCO's), throw them at db4o, and never look back.

But when I did a performance comparison, EF 4 was horribly slow. And I couldn't figure out why.

I use the following entities :



public class Recipe { private List _RecipePreparations; public int ID { get; set; } public String Name { get; set; } public String Description { get; set; } public List Tags { get; set; } public ICollection Preparations { get { return _RecipePreparations.AsReadOnly(); } }

    public void AddPreparation(RecipePreparation preparation) 
    {
        this._RecipePreparations.Add(preparation);
    }
}

public class RecipePreparation { public String Name { get; set; } public String Description { get; set; } public int Rating { get; set; } public List Steps { get; set; } public List Tags { get; set; } public int ID { get; set; } }

To test the performance I new up a recipe, and add 50.000 RecipePrepations. Then I stored the object in db4o like so :

IObjectContainer db = Db4oEmbedded.OpenFile(Db4oEmbedded.NewConfiguration(), @"RecipeDB.db4o");
db.Store(recipe1);
db.Close();

This takes around 13.000 (ms)

I store the stuff with EF4 in SQL Server 2008 (Express, locally) like this :

cookRecipes.Recipes.Add(recipe1);
cookRecipes.SaveChanges();

And that takes 200.000 (ms)

Now how on earth is db4o 15(!!!) times faster that EF4/SQL? Am I missing a secret turbo button for EF4? I even think that db4o could be made faster? Since I don't initialize the database file, I just let it grow dynamically.

Athiste answered 27/7, 2010 at 11:41 Comment(6)
My guess is that the overhead of many single insert-statements being executed is the largest portion of the difference. Is there a way to instruct EF4 to combine insert-statements to reduce that overhead?Seismoscope
@Lasse: Yes, there is. EF implements the unit of work pattern out of the box - see my answer.Irrelievable
I have done some profiling with Visual Studio. And the cookRecipes.Recipes.Add(recipe1) take approx 65% of total time to store, and SaveChanges approx 35% (duh... ;) ).Athiste
Not sure how much it matters but what CTP version of code-only did you use?Transcontinental
CTP 4 downloaded from here : microsoft.com/downloads/…Athiste
Code First (Code Only/CTP) has nothing to do with object storage - it is simply the part that creates the XML mapping for you.Pasquale
I
3

Did you call SaveChanges() inside the loop? No wonder it's slow! Try doing this:

foreach(var recipe in The500000Recipes)
{
    cookRecipes.Recipes.Add(recipe);
}
cookRecipes.SaveChanges();

EF expects you to make all the changes you want, and then call SaveChanges once. That way, it can optimize database communication and sql to perform the changes between opening state and saving state, ignoring all changes that you have undone. (For example, adding 50 000 records, then removing half of them, then hitting SaveChanges will only add 25 000 records to the database. Ever.)

Irrelievable answered 27/7, 2010 at 11:47 Comment(2)
The loop was before any data is stored in db4o or EF4/SQL. So I first new up Recipe, then in a loop I add the RecipePreparations. So now I have a Recipe, with 50.000 RecipePreparation attached to it. I then store this in db4o or EF4/SQL. So a single db.store(recipe1) in db4o, and a single cookRecipes.Recipes.Add(recipe1);cookRecipes.SaveChanges() in EF4.Athiste
DB4O is saving to a file on your local machine right? And EF4 has to open a database connection locally. The connection would typically remain open so it's a once-per-launch cost. Try adding a line to get the first item in the database before your timing loop for inserts to get that connection time out of the equation.Pasquale
B
2

Perhaps you can disable Changetracking while adding new objects, this would really increase Performance.

context.Configuration.AutoDetectChangesEnabled = false;

see also for more info: http://coding.abel.nu/2012/03/ef-code-first-change-tracking/

Bowling answered 12/2, 2013 at 15:17 Comment(0)
U
1

The EF excels at many things, but bulk loading is not one of them. If you want high-performance bulk loading, doing it directly through the DB server will be faster than any ORM. If your app's sole performance constraint is bulk loading, then you probably shouldn't use the EF.

Ut answered 27/7, 2010 at 13:23 Comment(5)
Then I'm having doubts about the use of EF. When you're app relies on a very complex Model (domain objects) then I'd prefer an oo database (like db4o). But when the data is mainly tabular you would use a traditional relational database with, and optionally an OR/M like EF. But as you say EF fails when you do heavy bulk load/inserting/updating. So what I was afraid of is actually true. EF4 is only an option when doing very light database operations, and you're stuck to using a relational database.Athiste
Again, not just EF, but any ORM will be slower than a DB server's bulk loading features. The EF may be slower than ORMs which support bulk inserts, but even those won't be as fast as the streaming APIs used on DB server dedicated bulk loading features. Bulk loading is a corner case for most apps, but if it's your bread and butter then you'd do better to use something like SSIS than an ORM.Ut
I agree that most of the OR/M's will be slower that just bulk loading. But I have to disagree that an OR/M couldn't also make use of the bulk loading features of a database. The code is very easy to generate. But I think something is going wrong with EF4 Code First. Since just adding the Recipe entity to the dbContext takes a lot of time (130 seconds). Storing in the database (dbContext.Recipes.SaveChanges) is not a speed demon either, 70 seconds for 50.001 rows. Which translates into 50.000/70 = 714 row/s.Athiste
I didn't say that they couldn't use bulk loading features. I said they generally don't. That said, if it's taking 130 seconds to add 1 entity then you have something entirely different going on. That is not normal performance. Maybe you're creating the DB? Code-first can do that implicitly.Ut
Well that's what I used to do. But now it should connect to an existing database. But I have to check whether it's not just checking whether all the entities have the same structure as the tables in the database.Athiste
C
1

Just to add on to the other answers: db4o typically runs in-process, while EF abstracts an out-of-process (SQL) database. However, db4o is essentially single-threaded. So while it might be faster for this one example with one request, SQL will handle concurrency (multiple queries, multiple users) much better than a default db4o database setup.

Consistence answered 24/8, 2010 at 15:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.