is there a way to throttle the indexing of Mysql tables so overall performance is not impacted?
Asked Answered
C

6

17

I need to load a large data set onto a production database.

15 files need to each be uploaded and inserted into a table. Each is about 500 Mb.

I have two ID columns that need to be indexed. If I load the files with indexes in place, the upload takes around 3 hours. If I drop indexes, load data local infile, then re-add the indexes, the whole operation takes about 30 minutes.

The problem is, database responsiveness takes a big hit while indexing the freshly imported data. Is there a way to make the indexing run at a "low priority" so that other queries still get 95-100% speed and the indexing kind of chugs along in the background?

I'm using Amazon RDS, so I don't have the option of just loading on a different server then copying over the table files.

Adding a bounty to this as I still want to see if there is a way to get good performance while indexing on a specific box.

Colquitt answered 10/5, 2011 at 0:28 Comment(4)
Please run SHOW CREATE TABLE tblname\G for the table being loaded. We need to see what storage engine the table is. Please show us the /etc/my.cnf file.Push
I'm currently using myisam, but I have also tried innodb. I'm open to using either. Again, the big issue is trying to throttle the index, not trying to increase performance.Colquitt
it's not clear how often you are performing this operation, but you seem to suggest that it's a one-off. Out of curiosity what is the issue of downtime circa 30mins? Couldn't that be performed in the low use periods (nighttime/lunchtime)?Drily
The data loads need to happen at least once a week, but sometimes as often as once per day. I don't care if they take 6 hours, I just want to throttle them so they don't affect the rest of the queries.Colquitt
C
4

Well, I never found a way to throttle, but I did figure out a way to alleviate my problem. The solution was unique to my problem, but I'll post it in case someone else finds it useful.

I wrote a class named CautiousIndexer.

  1. First I stored the create table statement to recreate the table structure without indexes. I stored an array of read slave databases, looped through them renaming the table with the unindexed data to prevent_indexing_($name).
  2. Then I ran the create table statement on the slaves only. This effectively moved the data out of the way of indexing statements that would happen on the master.
  3. Then I ran the index query against the master. Read slaves had no performance impact while the master was indexing because the newly created tables were empty.
  4. When the master finished indexing, I took 1 of the slaves out of production rotation, dropped the empty table, moved the full table back in place, then indexed the table on the out of production slave.
  5. When that finished I put it back in production and repeated the slave indexing procedure on the remaining slaves.
  6. When all slaves were indexed, I put the table into production.

This was still fine in terms of efficiency, but during the indexing on the master server write performance was unacceptably slowed. Still looking for a way to index with throttling.

Colquitt answered 18/5, 2011 at 21:30 Comment(1)
This solution helped, but I found it was insufficient. Certain queries still require direct access to the master db for transactions, and they are slowing unacceptably during indexing.Colquitt
K
1

This is not an exact solution you're looking for, but you can bing up a second mysqld instance as a slave on this single box and redirect SELECT queries to it as needed. There is MySQL Proxy which can help you to accomplish this without rewriting client apps.

You can also gather some ideas from FriendFeed usage of MySQL. They store actual indexes in other tables and use them for search. If you store a copy of you data in other table even on other server and run indexes there, you'll be able to access master data ASAP at full speed and get speedier queries later using other server.

It's like if you add indexes on a slave for search-type queries and run only primary key lookups on the master.

Kus answered 20/6, 2011 at 7:10 Comment(2)
This would be fine if I didn't need to process transactions also. I need the master up so that I can write to it. Writes are slowing down too much (on other tables) during indexing.Colquitt
I added a note about what actually they did at FriendFeed.Kus
A
1

A good solution to this is a script that performs a rolling update. You would apply the index to each slave in a non-replicating manner. A rough illustration:

for host in $hosts
do
    mysql -h $host -e "STOP SLAVE;\
      SET sql_log_bin=0;\
      FLUSH TABLE t;\
      ALTER TABLE t ADD INDEX a (b,c);\
      SET sql_log_bin=1;\
      START SLAVE;"
done

By turning off replication, the amount of disk activity should be reduced and increase the speed of the indexing operation. If you have database lag requirements for your slaves, you might want to entirely de-pool the slave and include logic to re-pool the slave when it resumes zero seconds lag.

Apocalypse answered 20/6, 2011 at 23:34 Comment(4)
I currently do almost exactly this, but it still isn't sufficient. The data indexing on the master causes the .5% of queries that need master data to be too slow. I need a way to throttle the indexing so it can index while still being responsive on other tables.Colquitt
Zak, you have a great business case for buying more equipment! Another possibility is to do the indexing of the table on a slave, copy it to the master, and then rename it like SET sql_log_bin=0; flush table t; rename t to dugout_t, t_atbat to t; SET sql_log_bin=1;Apocalypse
I'm kind of in a catch 22. I moved to RDS to avoid sys admin costs, but pay more per machine hour. However, RDS won't let you move an index or table in place because you don't have direct access to the file system. If I up my DB size in RDS, I'm going to lose all my sysadmin cost savings as the multiAZ DB's I'm using really start to get expensive!Colquitt
This is also addressable by policy and customer messaging. Does your site have a particular SLA agreement with customers? Consider creating a "site maintenance notification" on your customer login pages warning them of degraded or service outtage, then do the nasty on the date and time you warned them about.Apocalypse
D
0

An idea not tried before also not about index throttling , what if you make a backup table and update it with the way you mentioned has shorter time span and than convert/rename the tables. I encourage to write my thoughts b/c you need to know a way.

Dissociable answered 18/6, 2011 at 18:10 Comment(0)
S
0

You can disable any non-unique indexes while inserting, and re-enable them after you finish. Take a look at disable keys / enable keys. But it works only for non-unique indexes.

You can speed up inserts as well if you use multi-values insert statements (insert into table(...) values(...),(...),(...)...

By the way, load data infile seems to be the fastest way to insert a lot of data in mysql.

Sod answered 19/6, 2011 at 17:19 Comment(1)
Yes, I can load all data in about 5 minutes with disabled keys.. But when I enable keys .. indexing happens! This is what's killing my db performance.Colquitt
P
0

Have you tried bumping up your index settings for the import? That can increase import performance significantly. sort_buffer_size is for any table type, myisam_sort_buffer_size is for MyISAM tables. innodb_buffer_pool_size is sort of your "key cache" for Innodb. Bump those up for the import depending on your table type. What you are trying to do is avoid file sorting during index creation.

You may be able to get your import/index time down to 10-15 minutes or less. It's not throttling, but it will significantly shorten the impact period.

Or, if you are using MyISAM tables, maybe a MERGE table is an option? Create a new table, perform the import, than add the new table to the MERGE table. There will be no impact on the database during import. Aside from the server performing a task.

Parthenopaeus answered 23/6, 2011 at 5:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.