Why does MySQL decide on wrong index?

D

2

11

I have a partitioned table in MySQL that looks like this:

CREATE TABLE `table1` (
   `id` bigint(19) NOT NULL AUTO_INCREMENT,
   `field1` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
   `field2_id` int(11) NOT NULL,
   `created_at` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
   PRIMARY KEY (`id`,`created_at`),
   KEY `index1` (`field2_id`,`id`)
) ENGINE=InnoDB AUTO_INCREMENT=603221206 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
/*!50100 PARTITION BY RANGE (to_days(created_at))
(PARTITION p_0 VALUES LESS THAN (730485) ENGINE = InnoDB,
  ..... lots more partitions .....
 PARTITION p_20130117 VALUES LESS THAN (735250) ENGINE = InnoDB) */;

And this is a typical SELECT query on the table:

 SELECT field1 from TABLE1 where field2_id = 12345 and id > 13314313;

Doing an explain on it, MySQL sometimes decides to use PRIMARY instead of index1. This seems to be pretty consistent when you do a first explain. However, after a few repeated explains, MySQL finally decides to use the index. The problem is, this table has millions of rows, and inserts and selects are hitting it on the order of several times per second. Choosing the wrong index was causing these SELECT queries to take up to ~40 seconds, instead of sub second times. Can't really schedule downtime, so I can't run an optimize on the table (because of the size, it would probably take a long time), and not sure it would help in this case anyway.

I fixed this by forcing the index, so it looks like this:

 SELECT field1 from TABLE1 FORCE INDEX (index1) WHERE field2_id = 12345 and id > 13314313;

We're running this on MySQL 5.1.63, which we can't move away from at the moment.

My question is, why is MySQL choosing the wrong index? And is there something that can be done to fix it, besides forcing the index on all queries? Is partitioning confusing the InnoDB engine? I've worked a lot with MySQL, and have never seen this behavior before. The query is as simple as can be, and the index is also a perfect match. We have a lot of queries that are assuming the DB layer will do the right thing, and I don't want to go through all of them forcing to use the correct index.

Update 1:

This is the typical explain, without the FORCE INDEX clause. Once that's put in, the possible keys column only show the forced index.

 id select_type   table     type    possible_keys    key     key_len  ref   rows
  1 SIMPLE        table1    range   PRIMARY,index1   index1  12       NULL  207

Durkheim answered 19/1, 2013 at 12:2 Comment(6)

Post an EXPLAIN of your statement. – Peasecod 20/1, 2013 at 15:25

possibly due to id being a secondary member of index1, but the first member of the PK. a > 13314313 check would be easier where the field in question is the primary member of a key, which'd be stored in consecutive order. with the index1 setup, it's a secondary member, so possibly mysql feels it might be more work to have to check every single "chunk" of index for the > lookup, even though it's already having to do so for the field2_id business anyways. – Bluebill 21/1, 2013 at 20:7

Is there merit in forming a PK on an auto-incrementing id AND a timestamp? Especially in an InnoDB environment where the id always increments, regardless of whether or not the timestamp does! – Prosaic 24/1, 2013 at 17:13

@Prosaic very valid point kind of redundant. – Backing 25/1, 2013 at 8:41

I believe Range conditions (id > 13314313) can be applied to the first field of the composite index, that's why the optimizer prefers the PRIMARY key, which is a composite that starts with the <i>id</i> field. More or less what @MarcB commented. – Andraandrade 26/1, 2013 at 7:35

btw, any index implicitly contains the primary key at the end, that is your index1 declaration is redundant – Batik 26/1, 2013 at 13:5

C

4

I'm not 100% sure, but i think this sounds logic:

You partition your table BY RANGE (to_days(created_at)). the created_at field is part of the primary_key. Your select-queries are using the other part of the primary-key. This way the server optimization engine thinks this would be the speediest index - using the partition and the id-primary-part.

i suggest (without knowing the real cause that lead to your choice) to change your partition-range to the id and change the order of your index1-key.

for more information on partitioning have a look

Civil answered 28/1, 2013 at 12:34 Comment(0)

A

1

I'm not sure why the engine would pick the incorrect index. I would think that an index that has an EQUALITY test would supersede that of one with a >, < or range. However, another option that might help force the correct index would be to force a "computed" value on the other id column so the engine might not be able to do a direct correlation to the index... Something like

WHERE field2_id = 12345 and id > 13314313

changed to

WHERE field2_id = 12345 and id + 0 > 13314313

Audrieaudris answered 28/1, 2013 at 12:21 Comment(0)

Recommended topics

Hot tags