A literally insane answer, but if you have some kind of replication system set up (for a system with a billion rows, I hope you do), you can use a rough-estimator (like MAX(pk)
), divide that value by the number of slaves you have, run several queries in parallel.
For the most part, you'd partition the queries across slaves based on the best key (or the primary key I guess), in such a way (we're going to use 250000000 as our Rows / Slaves):
-- First slave
SELECT COUNT(pk) FROM t WHERE pk < 250000000
-- Ith slave where 2 <= I <= N - 1
SELECT COUNT(pk) FROM t WHERE pk >= I*250000000 and pk < (I+1)*250000000
-- Last slave
SELECT COUNT(pk) FROM t WHERE pk > (N-1)*250000000
But you need SQL only. What a bust. Ok, so let's say you're a sadomasochist.
On the master (or closest slave) you'd most likely need to create a table for this:
CREATE TABLE counter_table (minpk integer, maxpk integer, cnt integer, slaveid integer)
So instead of only having the selects running in your slaves, you'd have to do an insert, akin to this:
INSERT INTO counter_table VALUES (I*25000000, (I+1)*250000000, (SELECT COUNT(pk) FROM ... ), @@SLAVE_ID)
You may run into issues with slaves writing to a table on master. You may need to get even more sadis- I mean, creative:
-- A table per slave!
INSERT INTO counter_table_slave_I VALUES (...)
You should in the end have a slave that exists last in the path traversed by the replication graph, relative to the first slave. That slave should now have all other counter values, and should have its own values. But by the time you've finished, there probably are rows added, so you'd have to insert another one compensating for the recorded max pk in your counter_table and the current max pk.
At that point, you'd have to do an aggregate function to figure out what the total rows are, but that's easier since you'd be running it on at most the "number of slaves you have and change" rows.
If you're in the situation where you have separate tables in the slaves, you can UNION
to get all the rows you need.
SELECT SUM(cnt) FROM (
SELECT * FROM counter_table_slave_1
UNION
SELECT * FROM counter_table_slave_2
UNION
...
)
Or you know, be a bit less insane and migrate your data to a distributed processing system, or maybe use a Data Warehousing solution (which will give you awesome data crunching in the future too).
Do note, this does depend on how well your replication is set up. Since the primary bottleneck will most likely be persistent storage, if you have cruddy storage or poorly segregated data stores with heavy neighbor noise, this will probably run you slower than just waiting for a single SELECT COUNT(*) ...
But if you have good replication, then your speed gains should be directly related to the number or slaves. In fact, if it takes 10 minutes to run the counting query alone, and you have 8 slaves, you'd cut your time to less than a couple minutes. Maybe an hour to iron out the details of this solution.
Of course, you'd never really get an amazingly accurate answer since this distributed solving introduces a bit of time where rows can be deleted and inserted, but you can try to get a distributed lock of rows at the same instance and get a precise count of the rows in the table for a particular moment in time.
Actually, this seems impossible, since you're basically stuck with an SQL-only solution, and I don't think you're provided a mechanism to run a sharded and locked query across multiple slaves, instantly. Maybe if you had control of the replication log file... which means you'd literally be spinning up slaves for this purpose, which is no doubt slower than just running the count query on a single machine anyway.
So there's my two 2013 pennies.
Count(*)
. However, I don't know if a simple index on the "primary key" is enough, in general, across all DBMSs, so I won't add this as an answer. – Expatriateinsert trigger
is too expensive, butdelete trigger
is affordable, see my answer https://mcmap.net/q/108032/-fastest-way-to-count-exact-number-of-rows-in-a-very-large-table for an approach, for tables with auto-increment id: keep track of count and last id counted each day, have delete trigger decrement count when appropriate. – Bennett