Rails, how to migrate large amount of data?
Asked Answered
V

5

19

I have a Rails 3 app running an older version of Spree (an open source shopping cart). I am in the process of updating it to the latest version. This requires me to run numerous migrations on the database to be compatible with the latest version. However the apps current database is roughly around 300mb and to run the migrations on my local machine (mac os x 10.7, 4gb ram, 2.4GHz Core 2 Duo) takes over three days to complete.

I was able to decrease this time to only 16 hours using an Amazon EC2 instance (High-I/O On-Demand Instances, Quadruple Extra Large). But 16 hours is still too long as I will have to take down the site to perform this update.

Does anyone have any other suggestions to lower this time? Or any tips to increase the performance of the migrations?

FYI: using Ruby 1.9.2, and Ubuntu on the Amazon instance.

Verney answered 20/7, 2012 at 23:48 Comment(8)
Can you upgrade it incrementally? ie Not try and bump to the latest version, but upgrade release by release and only do a few migrations at a time?Insusceptible
How many migrations are there? What is the current version of spree you are using? Are you doing any of your own migrations? Are you moving around data in your migrations? Is Spree moving data around? What if you disable indexes and rebuild the indexes after the migrations (could be risky for some indexes - eg. unique indexes, so you could leave those in - but there could be other issues - so maybe not. But you could try this approach and see what times you get and if it is promising, think of a robust solution to do it). Surprising it takes so long - 300mb is really not much at all.Mango
@Beerlington Upgrading it version by version is an option, but this will take longer as I would have to make all the app customizations compatible to each version. And knowing my client he would not want to pay my hourly rate for that. He would rather just have me shut the site down for maintenance for a whole day. I was just hoping there was a way to increase the speed of the migrations.Verney
@Mango Currentyly the app is running spree 0.11.99. I am not doing any of my own migrations. Lots of data is being moved around for example this particular migration bit.ly/OOn6jQ generates an anonymous user for each order that was previously placed by a "guest" (someone who did not create an account). This migration alone takes over 3 hours to run. I will do some test runs with indexes disabled and then rebuild after. I was shocked as well that it takes this long for 300mb. Thanks for the suggestion.Verney
@akaDanPaul: which database backend are you using? From your two test runs'/database logs, are you able to identify which migrations (and which queries in those migrations, if applicable) are the performance hogs? How comfortable are you with tweaking those migrations? (Some migrations -- such as the one you quoted above -- can be rewritten in a couple of SQL queries that should take seconds or milliseconds, not hours, to run.) Beware that disabling/dropping indexes may actually severely harm your performance!Leonhard
About the migration you linked to, as an experiment, try wrapping it in an ActiveRecord::Base.transaction do.. end; it should ideally bring down the time from 3 hours. The downside of this is that when on production, the site will probably be "down". Also, like @Leonhard says about rebuilding indexes, you will want to take that into account. At any rate, this is a very learning experience. I feel your pain - hope something alleviates it asap! :) I'll keep an eye on this question.Mango
One way is to convert any lengthy migrations to direct SQL using execute. Rails will often run multiple queries when something could have been done in one. For example adding 2 columns and an index in Rails will create 3 queries to the database whereas direct SQL could do it in one.Deliadelian
Another way to optimize migrations where there are procedures happening on each result is to use find_each. I did a bunch of testing on a large migration and found that a batch size of ~100 find_each(batch_size: 100) {} was optimal. The migration ran many times faster that just using each. I would convert every instance of .where().each to use find_each and see how far you get.Limp
S
3
  • Dropping indices beforehand and adding them again afterwards is a good idea.

  • Also replacing .where(...).each with .find_each and perhaps adding transactions could help, as already mentioned.

  • Replace .save! with .save(:validate => false), because during the migrations you are not getting random inputs from users, you should be making known-good updates, and validations account for much of the execution time. Or using .update_attribute would also skip validations where you're only updating one field.

  • Where possible, use fewer AR objects in a loop. Instantiating and later garbage collecting them takes CPU time and uses more memory.

Sloth answered 2/8, 2012 at 13:45 Comment(0)
F
0

Maybe you have already considered this:

  1. Tell the db not to bother making sure everything is on disk (no WAL, no fsync, etc), you now have an in memory db which should make a very big difference. (Since you have taken the db offline you can just restore from a backup in the unlikely event of power loss or similar). Turn fsync/WAL on when you are done.

  2. It is likely that you can do some of the migrations before you take the db offline. Test this in staging env of course. That big user migration might very well be possible to do live. Make sure that you don't do it in a transaction, you might need to modify them a bit.

I'm not familiar with your exact situation but I'm sure there are even more things you can do unless this isn't enough.

Fransis answered 26/7, 2012 at 23:58 Comment(0)
N
0

This answer is more about approach than a specific technical solution. If your main criteria is minimum downtime (and data-integrity of course) then the best strategy for this is to not use rails!

Instead you can do all the heavy work up-front and leave just the critical "real time" data migration (i'm using "migration" in the non-rails sense here) as a step during the switchover.

So you have your current app with its db schema and the production data. You also (presumably) have a development version of the app based on the upgraded spree gems with the new db schema but no data. All you have to do is figure out a way of transforming the data between the two. This can be done in a number of ways, for example using pure SQL and temporary tables where necessary or using SQL and ruby to generate insert statements. These steps can be split up so that data that is fairly "static" (reference tables, products, etc) can be loaded into the db ahead of time and the data that changes more frequently (users, sesssions, orders, etc) can be done during the migration step.

You should be able to script this export-transform-import procedure so that it is repeatable and have tests/checks after it's complete to ensure data integrity. If you can arrange access to the new production database during the switchover then it should be easy to run the script against it. If you're restricted to a release process (eg webistrano) then you might have to shoe-horn it into a rails migration but you can run raw SQL using execute.

Narrate answered 29/7, 2012 at 5:49 Comment(0)
O
0

Take a look at this gem. https://github.com/zdennis/activerecord-import/

data = []
data << Order.new(:order_info => 'test order')
Order.import data
Olomouc answered 5/8, 2012 at 22:2 Comment(0)
R
0

Unfortunaltly the downrated solution is the only one. What is really slow in rails are the activerecord models. The are not suited for tasks like this.

If you want a fast migration you will have to do it in sql.

There is an other approach. But you will always have to rewrite most of the migrations...

Retsina answered 23/8, 2012 at 12:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.