Optimisation tips when migrating data into Sitecore CMS
Asked Answered
G

2

17

I am currently faced with the task of importing around 200K items from a custom CMS implementation into Sitecore. I have created a simple import page which connects to an external SQL database using Entity Framework and I have created all the required data templates.

During a test import of about 5K items I realized that I needed to find a way to make the import run a lot faster so I set about to find some information about optimizing Sitecore for this purpose. I have concluded that there is not much specific information out there so I'd like to share what I've found and open the floor for others to contribute further optimizations. My aim is to create some kind of maintenance mode for Sitecore that can be used when importing large columes of data.

The most useful information I found was on Mark Cassidy's blogpost http://intothecore.cassidy.dk/2009/04/migrating-data-into-sitecore.html. At the bottom of this post he provides a few tips for when you are running an import.

  • If migrating large quantities of data, try and disable as many Sitecore event handlers and whatever else you can get away with.
  • Use BulkUpdateContext()
  • Don't forget your target language
  • If you can, make the fields shared and unversioned. This should help migration execution speed.

The first thing I noticed out of this list was the BulkUpdateContext class as I had never heard of it. I quickly understood why as a search on the SND forum and in the PDF documentation returned no hits. So imagine my surprise when i actually tested it out and found that it improves item creation/deletes by at least ten fold!

The next thing I looked at was the first point where he basically suggests creating a version of web config that only has the bare essentials needed to perform the import. So far I have removed all events related to creating, saving and deleting items and versions. I have also removed the history engine and system index declarations from the master database element in web config as well as any custom events, schedules and search configurations. I expect that there are a lot of other things I could look to remove/disable in order to increase performance. Pipelines? Schedules?

What optimization tips do you have?

Gunfire answered 22/3, 2011 at 1:29 Comment(0)
S
16

Incidentally, BulkUpdateContext() is a very misleading name - as it really improves item creation speed, not item updating speed. But as you also point out, it improves your import speed massively :-)

Since I wrote that post, I've added a few new things to my normal routines when doing imports.

  • Regularly shrink your databases. They tend to grow large and bulky. To do this; first go to Sitecore Control Panel -> Database and select "Clean Up Database". After this, do a regular ShrinkDB on your SQL server
  • Disable indexes, especially if importing into the "master" database. For reference, see http://intothecore.cassidy.dk/2010/09/disabling-lucene-indexes.html
  • Try not to import into "master" however.. you will usually find that imports into "web" is a lot faster, mostly because this database isn't (by default) connected to the HistoryManager or other gadgets

And if you're really adventureous, there's a thing you could try that I'd been considering trying out myself, but never got around to. They might work, but I can't guarantee that they will :-)

  • Try removing all your field types from App_Config/FieldTypes.config. The theory here is, that this should essentially disable all of Sitecore's special handling of the content of these fields (like updating the LinkDatabase and so on). You would need to manually trigger a rebuild of the LinkDatabase when done with the import, but that's a relatively small price to pay

Hope this helps a bit :-)

Siege answered 22/3, 2011 at 8:43 Comment(4)
Nice question, nice answer, very useful information :). Thanks!Foltz
Import to web? then what happens the next time he hits the Publish button... ?Close
@Close I think he means import to the web database while testing the import and then run it against the master db when you are happy with itGunfire
Why ShrinkDB? Unless disk is an issue for you, you're just setting yourself up for some expensive file growth operations.Crushing
C
1

I'm guessing you've already hit this, but putting the code inside a SecurityDisabler() block may speed things up also.

I'd be a lot more worried about how Sitecore performs with this much data... assuming you only do the import once, who cares how long that process takes. Is this going to be a regular occurrence?

Close answered 24/3, 2011 at 19:6 Comment(2)
Yes it is already running inside a SecurityDisabler block. Sitecore Performance is definitely something I worry about but very early testing indicates that it will handle it. The import is meant to be a once only to migrate from the old CMS but I will be running it a few times as I tweak the templates.Gunfire
200K items doesn't actually present a problem to a Sitecore installation, as long as the ground rules are observed. If possible, no more than 100 leafs on a branch, and so on. But even so; if the source CMS is based on something that would make deep branching difficuly, slapping a lucene index on top of it all would allow Sitecore to navigate the structure comfortably; just avoid deep Sitecore Query expressions on the imported content and try to stay away from XSLTSiege

© 2022 - 2024 — McMap. All rights reserved.