Slow INSERT into InnoDB table with random PRIMARY KEY column's value
Asked Answered
D

1

5

For my website I use the PHP API for Flickr ( http://www.flickr.com/services/api/ ). This API provides several useful methods to get photos around particular GPS positions.

The call to API methods looks like URL with specific parameters like latitude, longitude, API key, radius, sorting, etc. Say, it'll look like http://api.flickr.com/method?lat=0.0&lon=0.0&radius=10

My website makes over 200,000 calls to API to generate several pages with pictures from Flickr. It is a pritty hard push on API thus I created a results cache in mySQL database.

Simplified scheme of the InnoDB table with cache is:

char(32) request
datetime expires // 2-3 days
text     response // serialized data from API response

where request is a PRIMARY KEY and represents an MD5 hash of a request URI. Other fields are quite simple :)

The problem arises when the table becomes large enough, say over 100,000 rows. New INSERTs take up to 2 seconds (and up to 6 (!) second with 1,000,000 rows).

As far as I understand the problem is with PRIMARY INDEX and engine being InnoDB. Every time a new request is being inserted, InnoDB engine rebuilds the tree index and moves data around, because MD5(request) is a really random value.

So... The question is whether there is a better way to cache such requests? Or maybe I should switch to MyISAM engine? Or may be I should try pseudo-partitioning and create several tables to solve the problem? Or may be just use not a BTREE but HASH index?

Any ideas are welcome!

Edit:

Ok, I tried to alter table as Furicane and Johan suggested, but still no luck - INSERTs takes up to 3 seconds. Currently request field became a normal non-unique index and new id column has been added as PRIMARY KEY with auto increment. Also I tried to add 4 partitions on this table with same results.

I think that index on request field is still a bottleneck. The only way I currently see is to determine all possible parameters, add them as columns to a table and then create index on them.

Any other ideas? :)

Edit 2:

Salman A in comments below said that his similar table performs much better (~0.03 for insert). This the problem may be in IO load on system. Though I cannot any high load on it.

iostat results:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          22.94    0.71    8.42    8.50    0.00   59.43

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              38.01       151.04       114.32 1383655437 1047309046

iotop results:

Total DISK READ: 152.91 K/s | Total DISK WRITE: 197.67 K/s

With mySQL on top of the list both for writing and reading. Maybe my disks are almost dead? How can I check disks performance?

Desensitize answered 25/10, 2011 at 22:13 Comment(8)
Don't use that hash as primary key, use a surrogate primary key (auto_increment) and have your hash field defined as unique key so you avoid duplicates. With sequential keys and no btree rebalancing, InnoDB inserts are rather ok and scale well.Lira
@Furicane: Nope, still 2 seconds to insert. See original post for details :)Desensitize
What happens if you remove the text fields from your table. Do the inserts get any faster?Larisalarissa
I have this table CREATE TABLE bd (id char(32) NOT NULL, name varchar(100) DEFAULT NULL, birthday date NOT NULL, PRIMARY KEY (id), KEY birthday (birthday)) ENGINE=InnoDB; with 103608 rows and insert takes ~0.03 seconds.Larisalarissa
@WASD42: Also check this article for InnoDB settings: mysqlperformanceblog.com/2007/11/01/…Tonsorial
I cannot remove this column - the only purpose of this table is to save responses of API :/ Though I tried to remove it on copy of it - no change. Now I think that the problem may be in dead IO... iostats added to main question.Desensitize
@WASD42: I thought moving the lengthy columns from the main table to another one should have made the inserts faster. If this is not the case then I guess it is a dead end. BTW you can analyze MySQL profiling results to be sure that it is the disk IO that takes time. May be dump the data on another machine and see if the performance is better there.Larisalarissa
It's a possibility that your disk is dying, especially with that low rate of reading/writing. Seek rate is probably also terrible.Lira
K
3

InnoDB does not support hash keys, only Btree.

MyISAM is infamous for being unreliable.
I think your problem is that you use the MD5 value for the primary key.

The primary key is included in every secondary key. And the PK is forced to be a unique key.

Set an integer autoincrement primary key and set your MD5 value as a normal index.
It does not even need to be unique, because that's a huge part of what's slowing you down.

After this your inserts should run much faster.

Kinescope answered 25/10, 2011 at 22:19 Comment(2)
Nope, still 2 seconds to insert. See original post for details :)Desensitize
Yes, the isssue is that you have a clustered (primary) index that does not store sequential values but (more or less) random ones. An integer autoincremented as a primary key would result in no re-arrangments when new rows are Inserted.Tonsorial

© 2022 - 2024 — McMap. All rights reserved.