How to optimize COUNT(*) performance on InnoDB by using index

M

3

32

I have a largish but narrow InnoDB table with ~9m records. Doing count(*) or count(id) on the table is extremely slow (6+ seconds):

DROP TABLE IF EXISTS `perf2`;

CREATE TABLE `perf2` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `channel_id` int(11) DEFAULT NULL,
  `timestamp` bigint(20) NOT NULL,
  `value` double NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `ts_uniq` (`channel_id`,`timestamp`),
  KEY `IDX_CHANNEL_ID` (`channel_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

RESET QUERY CACHE;
SELECT COUNT(*) FROM perf2;

While the statement is not run too often it would be nice to optimize it. According to http://www.cloudspace.com/blog/2009/08/06/fast-mysql-innodb-count-really-fast/ this should be possible by forcing InnoDB to use an index:

SELECT COUNT(id) FROM perf2 USE INDEX (PRIMARY);

The explain plan seems fine:

id  select_type table   type    possible_keys   key     key_len ref     rows    Extra
1   SIMPLE      perf2   index   NULL            PRIMARY 4       NULL    8906459 Using index

Unfortunately the statement is as slow as before. According to "SELECT COUNT(*)" is slow, even with where clause I've also tried optimizing the table without success.

What/is the/re a way to optimize COUNT(*) performance on InnoDB?

Marcionism answered 9/10, 2013 at 9:5 Comment(8)

Changing to MyISAM can do miracles - and it cost only a single click in PHPMyAdmin .) – Apotropaic 9/10, 2013 at 9:8

@Apotropaic It also costs enforcing foreign keys and using transactions. – Clarinda 9/10, 2013 at 10:1

@Jim, now I comprehend, I thought you meant transactions or foreign keys were mandatory for MyISAM, which I think not. Misunderstood "enforcing" – Apotropaic 9/10, 2013 at 10:21

@davidkonrad: true, but not the question asked :/ – Marcionism 9/10, 2013 at 12:43

The index on channel_id is redundant with the one starting with that column; drop the former. – Squash 22/9, 2017 at 15:54

@rick james thats the way doctrine orm creates the indexes. – Marcionism 23/9, 2017 at 17:39

(I'll add that to my list of reasons to avoid 3rd party packages.) – Squash 23/9, 2017 at 17:40

Oh... RESET QUERY CACHE is never needed for getting the current value of a SELECT. It is only useful for avoiding misleading timing. (I prefer to add SQL_NO_CACHE.) – Squash 23/9, 2017 at 17:44

M

20

For the time being I've solved the problem by using this approximation:

EXPLAIN SELECT COUNT(id) FROM data USE INDEX (PRIMARY)

The approximate number of rows can be read from the rows column of the explain plan when using InnoDB as shown above. When using MyISAM this will remain EMPTY as the table reference isbeing optimized away- so if empty fallback to traditional SELECT COUNT instead.

Marcionism answered 9/10, 2013 at 18:44 Comment(4)

Remember that the "approximation" here is far from accurate, and for a table with 1M rows might return anywhere between 100K and 10M. SHOW TABLE STATUS is similar. It's really unreliable. I think a lot of the performance problems are fixed in MySQL 5.7, though. – Tager 9/10, 2013 at 18:49

This approximation is so incredibly unreliable that I can't think of a situation where I'd use it. For a large table, I would rather execute the full COUNT(*) periodically and use that value until the next count. – Middling 2/5, 2014 at 11:15

The Auto_increment field within SHOW TABLE STATUS is much more accurate for a count of the whole table, with much greater speed. – Gemperle 25/10, 2016 at 11:47

@SamDoidge be careful if there have been records deleted from the table then Auto_increment will be an over-estimation. – Jock 22/8, 2017 at 18:57

E

23

As of MySQL 5.1.6 you can use the Event Scheduler and insert the count to a stats table regularly.

First create a table to hold the count:

CREATE TABLE stats (
`key` varchar(50) NOT NULL PRIMARY KEY,
`value` varchar(100) NOT NULL);

Then create an event to update the table:

CREATE EVENT update_stats
ON SCHEDULE
  EVERY 5 MINUTE
DO
  INSERT INTO stats (`key`, `value`)
  VALUES ('data_count', (select count(id) from data))
  ON DUPLICATE KEY UPDATE value=VALUES(value);

It's not perfect but it offers a self contained solution (no cronjob or queue) that can be easily tailored to run as often as the required freshness of the count.

Equilibrate answered 19/12, 2013 at 3:57 Comment(2)

Didn't know about the events, thank you. Would still be a killer for my smallish RasPi running the DB :( – Marcionism 27/10, 2014 at 15:19

This is nice and useful – Howerton 29/9, 2017 at 13:39

M

20

For the time being I've solved the problem by using this approximation:

EXPLAIN SELECT COUNT(id) FROM data USE INDEX (PRIMARY)

The approximate number of rows can be read from the rows column of the explain plan when using InnoDB as shown above. When using MyISAM this will remain EMPTY as the table reference isbeing optimized away- so if empty fallback to traditional SELECT COUNT instead.

Marcionism answered 9/10, 2013 at 18:44 Comment(4)

Remember that the "approximation" here is far from accurate, and for a table with 1M rows might return anywhere between 100K and 10M. SHOW TABLE STATUS is similar. It's really unreliable. I think a lot of the performance problems are fixed in MySQL 5.7, though. – Tager 9/10, 2013 at 18:49

This approximation is so incredibly unreliable that I can't think of a situation where I'd use it. For a large table, I would rather execute the full COUNT(*) periodically and use that value until the next count. – Middling 2/5, 2014 at 11:15

The Auto_increment field within SHOW TABLE STATUS is much more accurate for a count of the whole table, with much greater speed. – Gemperle 25/10, 2016 at 11:47

@SamDoidge be careful if there have been records deleted from the table then Auto_increment will be an over-estimation. – Jock 22/8, 2017 at 18:57

N

17

Based on @Che code, you can also use triggers on INSERT and on UPDATE to perf2 in order to keep the value in stats table up to date in realtime.

CREATE TABLE stats (
 `key`   varchar(50)  NOT NULL PRIMARY KEY,
 `value` varchar(100) NOT NULL
);

Then:

CREATE TRIGGER `count_up` AFTER INSERT   ON `perf2` FOR EACH ROW UPDATE `stats`
SET   `stats`.`value` = `stats`.`value` + 1 
WHERE `stats`.`key` = 'perf2_count';

CREATE TRIGGER `count_down` AFTER DELETE ON `perf2` FOR EACH ROW UPDATE `stats`
SET   `stats`.`value` = `stats`.`value` - 1 
WHERE `stats`.`key` = 'perf2_count';

So the number of rows in the perf2 table can be read using this query, in realtime:

SELECT `value` FROM `stats` WHERE `key` = 'perf2_count';

This would have the advantage of eliminating the performance issue of performing a COUNT(*) and would only be executed when data changes in perf2.

Nonscheduled answered 19/12, 2013 at 4:38 Comment(5)

This answer is pretty interesting, efficient and real-time. If you think this answer is confusing, is just because it's the only one in this page using apexes in the right way. – Preemption 4/2, 2022 at 11:40

@ValerioBozz this is actually not efficient, as triggers cause insert/deletes to take much longer compared to no triggers. Especially for large table inserts. – Eloign 21/12, 2022 at 20:54

@JohnC If you have large set of table inserts I think that a START TRANSACTION; ....; COMMIT can help. I think that it's easy to say that running a SELECT COUNT(*) on 9 million records is surely not efficient, compared to this solution. Feel free to post another even-more-efficient solution. – Preemption 22/12, 2022 at 13:46

@ValerioBozz starting a transaction will not change anything with the trigger, triggers still execute as normal within a transaction. We do select count on tables with 156 million records all the time. If the correct index is created, it takes milliseconds. This is the worse way to keep track of table count. Try to avoid triggers at all costs, they take up too much resources. – Eloign 16/3, 2023 at 7:10

Hi @JohnC - I'm trying to understand how this answer can be improved, but I don't get how an index can help in doing a COUNT(*) to obtain the number "156000000" in a table of 156 million records, efficiently (so, without a full table scan). Please expand your comment. Thank you for this clarification. Without this info, a trigger is still a decent and scalable idea. Better than an FTS. – Preemption 16/3, 2023 at 12:34

Recommended topics

Hot tags