I gotta add another Answer -- I have many corrections/additions to the comments and Answers so far.
For MyISAM, SELECT COUNT(*)
without WHERE
is dead-reckoned -- very fast. All other situations (include the InnoDB in the Question) must count through either the data's BTree or an index's BTree to get the answer. So we need to see how much to count through.
InnoDB caches data and index blocks (16KB each). But when the table's data or index BTree is bigger than innodb_buffer_pool_size
, you are guaranteed to hit the disk. Hitting the disk is almost always the slowest part of any SQL.
The Query Cache, when involved, usually results in query times of about 1 millisecond; this does not seem to be an issue with any of the timings quoted. So I won't dwell on it.
But... Runing the same query twice in a row will often exhibit:
- First run: 10 seconds
- Second run: 1 second
This is symptomatic of the first run having to fetch most of the blocks from disk, while the second found it all in RAM (the buffer_pool). I suspect that some of the timings listed are bogus because of not realizing this caching issue. (16 sec vs 0.6 sec may be explained by this.)
I will harp on "disk hits" or "blocks needed to be touched" as the real metric of which SQL is faster.
COUNT(x)
checks x
for IS NOT NULL
before tallying. This adds a tiny amount of processing, but does not change the number of disk hits.
The proffered table has a PK and a second column. I wonder if that is the real table?? It makes a difference --
- If the Optimizer decides to read the data -- that is, scan in
PRIMARY KEY
order -- it will be reading the data BTree, which is usually (but not in this lame example) much wider than secondary index BTrees.
- If the Optimizer decides to read a secondary index (but not need to do a sort), there will be fewer blocks to touch. Hence, faster.
Comments on the original queries:
SELECT COUNT(*) FROM table WHERE fk = 1 => 16 seconds
-- INDEX(fk) is optimal, but see below
SELECT COUNT(*) FROM table WHERE fk = 1 LIMIT 5000 => 16 seconds
-- the LIMIT does nothing, since there is only one row in the result
SELECT primary FROM table WHERE fk = 1 => 0.6 seconds
-- Again INDEX(fk), but see below
WHERE fk = 1
begs for INDEX(fk, ...)
, preferably just INDEX(fk)
. Note that in InnoDB, each secondary index contains a copy of the pk. That is, INDEX(fk)
is effectively INDEX(fk, primary)
. Hence, the 3rd query can use that as "covering" and not need to touch the data.
If the table is truly just the two columns then probably the secondary index BTree will be fatter than the data BTree. But in realistic tables, the secondary index will be smaller. Hence an index scan will be faster (fewer blocks to touch) than a table scan.
The third query is also delivering a large resultset; this could cause the query to take a long time -- but it won't be included in the quoted "time"; it is network time, not query time.
innodb_buffer_pool_size = 25,769,803,776
I would guess that the table and its secondary index (from the FK) are each about 3-4GB. So, any timing might first have to load a lot of stuff. Then a second run would be entirely cached. (Of course, I don't know how many rows have fk=1
; presumably less than all the rows?)
But... At 600M rows, the table and its index are each approaching the 25GB buffer_pool. So, the day may come soon that it becomes I/O bound -- this will make you wish to get back to 16 (or 25) seconds; yet you won't be able to. We can then talk about alternatives to doing the COUNT
.
SELECT 1 FROM tbl WHERE fk = 1 LIMIT 5000,1
-- Let's analyze this. It will scan the index, but it will stop after 5000 rows. Of all you need is "more than 5K", that is the best way to get it. It will be consistently fast (touching only a dozen blocks), regardless of total number of rows in the table. (It is still subject to buffer_pool_size and cache characteristics of the system. But a dozen blocks takes much less than a second, even with a cold cache.)
MariaDB's LIMIT ROWS_EXAMINED
may be worth looking into. Without that, you could do
SELECT COUNT(*) AS count_if_less_than_5K
FROM ( SELECT 1 FROM tbl WHERE fk = 1 LIMIT 5000 );
It may be faster than delivering the rows to the client; it will have to collect the rows internally in a tmp table, but deliver only the COUNT
.
A side note: 640K rows inserted per day -- this approaches the limit for single-row INSERTs
in MySQL with your current settings on a HDD (not SDD). If you need to discuss the potential disaster, open another Question.
Bottom line:
- Be sure to avoid the Query cache. (by using
SQL_NO_CACHE
or turning the QC off)
- Run any timing query twice; use the second time.
- Understand the structure and size of the BTree(s) involved.
- Don't use
COUNT(x)
unless you need the null check.
- Do not use PHP's
mysql_*
interface; switch to mysqli_*
or PDO
.
COUNT()
, as such:SELECT COUNT(fk) FROM table WHERE fk = 1
– JohnjohnaCOUNT(*)
is optimized to return very quickly if theSELECT
retrieves from one table, no other columns are retrieved, and there is noWHERE
clause. Indeed, the blog to which you linked suggests thatCOUNT(*)
is faster thanCOUNT(column)
. – CristiecristinSELECT COUNT(*) FROM table WHERE fk = 1
needs 16 seconds? Do you have an index onfk
? – AerodyneSELECT SQL_NO_CACHE COUNT(fk) FROM table WHERE fk = 1
and you have an index defined infk
? – SwamplandSELECT COUNT(fk) > 5000 FROM table WHERE fk = 1
perform? – FeculentSELECT COUNT(1) FROM table WHERE fk = 1
and let me know how much seconds it takes?? – TubuleSHOW CREATE TABLE table;
andSHOW VARIABLES LIKE '%innodb%';
so we can see the necessary info first hand. – Gymnasiarch