Fastest way to count exact number of rows in a very large table?
Asked Answered
P

27

278

I have come across articles that state that SELECT COUNT(*) FROM TABLE_NAME will be slow when the table has lots of rows and lots of columns.

I have a table that might contain even billions of rows [it has approximately 15 columns]. Is there a better way to get the EXACT count of the number of rows of a table?

Please consider the following before your answer:

  • I am looking for a database vendor independent solution. It is OK if it covers MySQL, Oracle, MS SQL Server. But if there is really no database vendor independent solution then I will settle for different solutions for different database vendors.

  • I cannot use any other external tool to do this. I am mainly looking for a SQL based solution.

  • I cannot normalize my database design any further. It is already in 3NF and moreover a lot of code has already been written around it.

Pennell answered 20/5, 2011 at 8:18 Comment(9)
(A) It's very unlikely to have something that will work identically or optimally on every single database possible and (B) few if any are knowledgeable on every database to be able to comment.Semipalatinsk
Wouldn't we all hope that this particular construct has been optimized by our database vendor?Towle
@Swaranga, can you elucidate a bit more on what this database maintenance purpose is that must know the exact number of rows in the table? I can't imagine. And as Kevin says, if there was a quicker way than COUNT(*) then the DBMS vendor would (should) surely re-implement COUNT(*) to use it...Rover
I assume the appropriate answer should be to add some sort of index to the table, and then just use Count(*). However, I don't know if a simple index on the "primary key" is enough, in general, across all DBMSs, so I won't add this as an answer.Expatriate
Surely if the table is being written to often then your exact count will only be exact for a particular point in time, and may not even be accurate if other processes are writing to the table, unless you put a table lock on the query.Denudate
You could use insert and delete triggers to keep a rolling count?Luigiluigino
If insert trigger is too expensive, but delete trigger is affordable, see my answer https://mcmap.net/q/108032/-fastest-way-to-count-exact-number-of-rows-in-a-very-large-table for an approach, for tables with auto-increment id: keep track of count and last id counted each day, have delete trigger decrement count when appropriate.Bennett
See also Faster alternative in Oracle to SELECT COUNT(*) FROM sometableCaller
Presumably a "very large table" has data inserted into it so quickly that if you run your query a few minutes (or seconds) apart you'd get different counts. And in that case I'm not sure why an exact row count would matter so much. An efficient very close estimate would likely be preferred over exactness (if it's slower) in most use cases, for a very large table.Bradleigh
L
287

Simple answer:

  • Database vendor independent solution = use the standard = COUNT(*)
  • There are approximate SQL Server solutions but don't use COUNT(*) = out of scope

Notes:

COUNT(1) = COUNT(*) = COUNT(PrimaryKey) just in case

Edit:

SQL Server example (1.4 billion rows, 12 columns)

SELECT COUNT(*) FROM MyBigtable WITH (NOLOCK)
-- NOLOCK here is for me only to let me test for this answer: no more, no less

1 runs, 5:46 minutes, count = 1,401,659,700

--Note, sp_spaceused uses this DMV
SELECT
   Total_Rows= SUM(st.row_count)
FROM
   sys.dm_db_partition_stats st
WHERE
    object_name(object_id) = 'MyBigtable' AND (index_id < 2)

2 runs, both under 1 second, count = 1,401,659,670

The second one has less rows = wrong. Would be the same or more depending on writes (deletes are done out of hours here)

Lovelovebird answered 20/5, 2011 at 8:23 Comment(24)
Nope, COUNT(*) = COUNT(key). This is just wrong. If there is no NOT NULL constraint - then they can be not equal (in results as well as in execution plan).Keslie
@zerkmsby: For COUNT(key) I meant COUNT(primarykey) which should be non nullable. I'll clarifyLovelovebird
then probably PK or primarykey name would be more appropriate, imho ;-)Keslie
@Lovelovebird What should be the query if we have some joins and where condition? Please specify, my table also have few millions of records but i want count based on some conditionElectrostriction
@KamalDeepSingh: you have to use COUNT then as normalLovelovebird
with (NOLOCK) is not something that allows it to run on production, and it can lead to an inaccurate count. When you use that hint, sure it prevents locks but the side effects on a production box are that you can count rows twice in some situations or skip rows in other situations. NOLOCK is better to use on a table that is not being written to because it allows "dirty reads". Don't advise people to use that hint unless they fully understand the consequencesCardwell
@Cardwell - I agree with the point you make about understanding the ramifications of NOLOCK before using it, but you say: "NOLOCK is better to use on a table that is not being written to because it allows "dirty reads"" - sounds confusing?! If the table is not being written to, what benefit is a dirty read? There would be nothing making the table "dirty"?Strachey
@Lovelovebird +1 for a very thorough explanation! IMHO DMVs are gems that more people should be made aware of. Looks like the DMV query outperforms COUNT(*) hands down. Could you care to elaborate on "The second one has less rows = wrong..."?Strachey
@Davos: the NOCOUNT here was for my use onlyLovelovebird
@mishrsud the benefit of NOLOCK is that it won't lock rows/tables/pages that it normally would. Fine on a table that is experiencing only reads e.g. multiple reporting queries because there will be less contention between those queries as they don't wait for locks to be released. There are always some locks though - NOLOCK won't avoid all of them, but it can make the query quicker because it doesn't have to wait. It's more important that you don't use NOLOCK on a table being updated/inserted/deleted, unless accuracy is not a concern.Cardwell
@Cardwell Ok, so what you mean is that using the DMV query would be more accurate as NOLOCK may include rows that may not be committed (as it does a dirty read)?Strachey
@mishrsud The only accurate query is the SELECT COUNT(*), but it's slow. You can either have exact & slow, or rough and quick. What you do will depend on what's more important for the purpose you need the count for. NO LOCK might include or indeed exclude rows that are mid-transaction or moving pages for whatever reason.Cardwell
I preffer the way Pinal Dave show ;) blog.sqlauthority.com/2007/01/10/… You don't need all the rights on the database.Dentilingual
@Lovelovebird very nice solution, can you tell what is use of index_id < 2?Beckman
@commit: it is the "main" index for that table. See technet.microsoft.com/en-us/library/ms187737.aspxLovelovebird
hi., can i use this query for get the count from #temp tableAbele
If the goal is performance (vs. accuracy), i would add that object_id = object_id('Table') is [ever slightly] more efficient than object_name(object_id) = 'Table'Nimocks
@Lovelovebird I need to Count The Records while using Multiple Join Statement mainly Inner Join. How i can incorporate Join with the Current Solution? Is there a way to do it?Keeleykeelhaul
@sotn: this groups by object_id, ignore partition_id. See learn.microsoft.com/en-us/sql/relational-databases/…Lovelovebird
SHOW TABLE STATUS is infinitely fasterFiliano
@MartijnScheffer OK, try it in Microsoft SQL Server and report back. It ia not a "database vendor independent solution" which is COUNT clearly for all platform as per the ANSI SQL standardsLovelovebird
@MartijnScheffer: ah yes, dev.mysql.com/doc/refman/5.7/en/show-table-status.html it's approximate as well. Read the question...Lovelovebird
@MartijnScheffer did you read the question? Especially this part?: "Is there a better way to get the EXACT count of the number of rows of a table?" (the bold as it is from the OP)Sarajane
yes i realise now that "show table status" doesnt work at all ! sorryFiliano
M
34

The fastest way by far on MySQL is:

SHOW TABLE STATUS;

You will instantly get all your tables with the row count (which is the total) along with plenty of extra information if you want.

Miniaturist answered 25/3, 2012 at 23:41 Comment(5)
Smart way..with this you can get row count of multiple tables in 1 query.Edelman
did you run on db having tables with ~billion entries like @Lovelovebird and noticed the time ?Sloat
which value is the total row count for all tables in database? And these are approximate - what if you want exact row count values?Incomer
this doesn't work at all, on INNODB for example, the storage engine reads a few rows and extrapolates to guess the number of rowsFiliano
it's good to get a total count without query condition. but what if I want the result with any query condition?Deoxygenate
S
31

I got this script from another StackOverflow question/answer:

SELECT SUM(p.rows) FROM sys.partitions AS p
  INNER JOIN sys.tables AS t
  ON p.[object_id] = t.[object_id]
  INNER JOIN sys.schemas AS s
  ON s.[schema_id] = t.[schema_id]
  WHERE t.name = N'YourTableNameHere'
  AND s.name = N'dbo'
  AND p.index_id IN (0,1);

My table has 500 million records and the above returns in less than 1ms. Meanwhile,

SELECT COUNT(id) FROM MyTable

takes a full 39 minutes, 52 seconds!

They yield the exact same number of rows (in my case, exactly 519326012).

I do not know if that would always be the case.

Sommerville answered 31/5, 2018 at 15:9 Comment(5)
Can you add an parameter to get rows count with this query? Example: Select COUNT(1) FROM TABLENAME WHERE ColumnFiled = '1' With your query?Munson
That is the count - the number of rows (records) is the "count" in this case. "500 million records" was an approximate number, and "519326012" was the exact number of rows, or count. Rows = records = count.Sommerville
When I do a count on table (select count(*) from table) vs this solution, the count of the latter is 11 less.Crenation
Hi, is there some way to do the same with two tables related by an inner join?Earlineearls
Is any way to find row count with where condition in large database in lessthan a second ?Noise
B
12

You can try this sp_spaceused (Transact-SQL)

Displays the number of rows, disk space reserved, and disk space used by a table, indexed view, or Service Broker queue in the current database, or displays the disk space reserved and used by the whole database.

Bird answered 20/5, 2011 at 8:26 Comment(2)
Won't sp_spaceused give me an approximated count?Pennell
FYI: This uses sys.dm_db_partition_stats internallyLovelovebird
E
10

I have come across articles that state that SELECT COUNT(*) FROM TABLE_NAME will be slow when the table has lots of rows and lots of columns.

That depends on the database. Some speed up counts, for instance by keeping track of whether rows are live or dead in the index, allowing for an index only scan to extract the number of rows. Others do not, and consequently require visiting the whole table and counting live rows one by one. Either will be slow for a huge table.

Note that you can generally extract a good estimate by using query optimization tools, table statistics, etc. In the case of PostgreSQL, for instance, you could parse the output of explain count(*) from yourtable and get a reasonably good estimate of the number of rows. Which brings me to your second question.

I have a table that might contain even billions of rows [it has approximately 15 columns]. Is there a better way to get the EXACT count of the number of rows of a table?

Seriously? :-) You really mean the exact count from a table with billions of rows? Are you really sure? :-)

If you really do, you could keep a trace of the total using triggers, but mind concurrency and deadlocks if you do.

Etiolate answered 20/5, 2011 at 8:31 Comment(9)
It's a lucky thing that Google managers are more reasonable than your boss... Picture how slow it would be if it returned the exact number of search results for each of your queries instead of sticking to an estimate number.Etiolate
At least you empathize with me. How about an only Oracle solution? That will reduce my issue to an extent. Presently the customer is using Oracle; so if I come up with a workaround only for Oracle, that will do [for the time being]. :)Pennell
Well, you could always keep a counter updated using a trigger. Mind the concurrency if you do, though. :-)Etiolate
"Yes Denis, the exact count is required. :(" - well I can only speculate. Does the db maintenance process find out that there are 42,123,876 rows in table A and then create 42,123,876 empty rows in table B, and then loop through table A and update the rows in table B...? Or is it crazier than that? ;-)Rover
@Denis, using triggers will be a terrible idea here. It will slow down your inserts and could affect performance.Brokaw
@OsaE, how can it slow down inserts? After-insert-trigger should simply increase the counter by the amount of inserter rows, and after-delete-trigger should decrease it.Wherein
@Wherein because it creates convoy syndrome for all updaters that need to update the count.Contrived
@ErwinSmout can you explain what is a convoy sindrome in the context?Wherein
Transaction 2 cannot begin before transaction 1 has committed. Without the "counts table" update, many update transactions could run in parallel. With the "counts table", each transaction has to "obtain a ticket" for updating its count. So transactions start queueing up at the ticket machine (the scheduler deciding who will be the next to get a lock on the counts table).Contrived
C
10

Is there a better way to get the EXACT count of the number of rows of a table?

To answer your question simply, No.

If you need a DBMS independent way of doing this, the fastest way will always be:

SELECT COUNT(*) FROM TableName

Some DBMS vendors may have quicker ways which will work for their systems only. Some of these options are already posted in other answers.

COUNT(*) should be optimized by the DBMS (at least any PROD worthy DB) anyway, so don't try to bypass their optimizations.

On a side note:
I am sure many of your other queries also take a long time to finish because of your table size. Any performance concerns should probably be addressed by thinking about your schema design with speed in mind. I realize you said that it is not an option to change but it might turn out that 10+ minute queries aren't an option either. 3rd NF is not always the best approach when you need speed, and sometimes data can be partitioned in several tables if the records don't have to be stored together. Something to think about...

Costplus answered 26/5, 2011 at 19:46 Comment(0)
S
10

I found this good article SQL Server–HOW-TO: quickly retrieve accurate row count for table from martijnh1 which gives a good recap for each scenarios.

I need this to be expanded where I need to provide a count based on a specific condition and when I figure this part, I'll update this answer further.

In the meantime, here are the details from article:

Method 1:

Query:

SELECT COUNT(*) FROM Transactions 

Comments:

Performs a full table scan. Slow on large tables.

Method 2:

Query:

SELECT CONVERT(bigint, rows) 
FROM sysindexes 
WHERE id = OBJECT_ID('Transactions') 
AND indid < 2 

Comments:

Fast way to retrieve row count. Depends on statistics and is inaccurate.

Run DBCC UPDATEUSAGE(Database) WITH COUNT_ROWS, which can take significant time for large tables.

Method 3:

Query:

SELECT CAST(p.rows AS float) 
FROM sys.tables AS tbl 
INNER JOIN sys.indexes AS idx ON idx.object_id = tbl.object_id and
idx.index_id < 2 
INNER JOIN sys.partitions AS p ON p.object_id=CAST(tbl.object_id AS int) 
AND p.index_id=idx.index_id 
WHERE ((tbl.name=N'Transactions' 
AND SCHEMA_NAME(tbl.schema_id)='dbo')) 

Comments:

The way the SQL management studio counts rows (look at table properties, storage, row count). Very fast, but still an approximate number of rows.

Method 4:

Query:

SELECT SUM (row_count) 
FROM sys.dm_db_partition_stats 
WHERE object_id=OBJECT_ID('Transactions')    
AND (index_id=0 or index_id=1); 

Comments:

Quick (although not as fast as method 2) operation and equally important, reliable.

Sefton answered 6/4, 2018 at 14:41 Comment(2)
Thanks! Really useful tip. I do not have permission to view system tables so method 4 is not me. However method 3 is good enough.Robinett
Just a note for Method 3. It should be the SUM(CAST(p.rows AS FLOAT)) otherwise in partitioned tables we get n rows in output.Maximin
Z
6

If SQL Server edition is 2005/2008, you can use DMVs to calculate the row count in a table:

-- Shows all user tables and row counts for the current database 
-- Remove is_ms_shipped = 0 check to include system objects 
-- i.index_id < 2 indicates clustered index (1) or hash table (0) 
SELECT o.name, 
 ddps.row_count 
FROM sys.indexes AS i 
 INNER JOIN sys.objects AS o ON i.OBJECT_ID = o.OBJECT_ID 
 INNER JOIN sys.dm_db_partition_stats AS ddps ON i.OBJECT_ID = ddps.OBJECT_ID 
 AND i.index_id = ddps.index_id 
WHERE i.index_id < 2 
 AND o.is_ms_shipped = 0 
ORDER BY o.NAME 

For SQL Server 2000 database engine, sysindexes will work, but it is strongly advised to avoid using it in future editions of SQL Server as it may be removed in the near future.

Sample code taken from: How To Get Table Row Counts Quickly And Painlessly

Zabaglione answered 20/5, 2011 at 9:1 Comment(2)
This is approximate not exact: see my answer pleaseLovelovebird
Do you know an example where this is not accurate? AFAIK, it does not depend on updated statistics.Zabaglione
P
5

I use

select /*+ parallel(a) */  count(1) from table_name a;
Pekan answered 3/7, 2011 at 8:9 Comment(1)
select /*+ parallel(a) */ count(1) from table_name aPekan
M
5

I'm nowhere near as expert as others who have answered but I was having an issue with a procedure I was using to select a random row from a table (not overly relevant) but I needed to know the number of rows in my reference table to calculate the random index. Using the traditional Count(*) or Count(1) work but I was occasionally getting up to 2 seconds for my query to run. So instead (for my table named 'tbl_HighOrder') I am using:

Declare @max int

Select @max = Row_Count
From sys.dm_db_partition_stats
Where Object_Name(Object_Id) = 'tbl_HighOrder'

It works great and query times in Management Studio are zero.

Motile answered 30/6, 2013 at 1:43 Comment(1)
FWIW, you should mention WHICH database vendor you are using; I think the statement would be slightly different depending on vendor.Bennett
M
5

Well, late by 5 years and unsure if it helps :

I was trying to count the no. of rows in a SQL Server table using MS SQL Server Management Studio and ran into some overflow error, then I used the below :

select count_big(1) FROM [dbname].[dbo].[FactSampleValue];

The result :

24296650578 rows

Mute answered 19/5, 2016 at 9:1 Comment(0)
J
4

Not exactly a DBMS-agnostic solution, but at least your client code won't see the difference...

Create another table T with just one row and one integer field N1, and create INSERT TRIGGER that just executes:

UPDATE T SET N = N + 1

Also create a DELETE TRIGGER that executes:

UPDATE T SET N = N - 1

A DBMS worth its salt will guarantee the atomicity of the operations above2, and N will contain the accurate count of rows at all times, which is then super-quick to get by simply:

SELECT N FROM T

While triggers are DBMS-specific, selecting from T isn't and your client code won't need to change for each supported DBMS.

However, this can have some scalability issues if the table is INSERT or DELETE-intensive, especially if you don't COMMIT immediately after INSERT/DELETE.


1 These names are just placeholders - use something more meaningful in production.

2 I.e. N cannot be changed by a concurrent transaction between reading and writing to N, as long as both reading and writing are done in a single SQL statement.

Johnsiejohnson answered 12/6, 2013 at 9:49 Comment(1)
This can be a Great hack. In the long run especiallyPropagandist
C
3

I don't think there is a general always-fastest solution: some RDBMS/versions have a specific optimization for SELECT COUNT(*) that use faster options while others simply table-scan. You'd need to go to the documentation/support sites for the second set, which will probably need some more specific query to be written, usually one that hits an index in some way.

EDIT:

Here's a thought that might work, depending on your schema and distribution of data: do you have an indexed column that references an increasing value, a numeric increasing ID, say, or even a timestamp or date? Then, assuming deletes don't happen, it should be possible to store the count up to some recent value (yesterday's date, highest ID value at some recent sample point) and add the count beyond that, which should resolve very quickly in the index. Very dependent on values and indices, of course, but applicable to pretty much any version of any DBMS.

Carmody answered 20/5, 2011 at 8:25 Comment(2)
I would very much hope that any decent DBMS would use an index for SELECT COUNT(*). Even MySQL apparently does it... .Ironstone
assuming deletes don't happen - seriously?? ;pBennett
P
3

I am late to this question, but here is what you can do with MySQL (as I use MySQL). I am sharing my observations here:

1) SELECT COUNT(*) AS TOTAL_ROWS FROM <TABLE_NAME>

Result
Row Count: 508534
Console output: Affected rows: 0 Found rows: 1 Warnings: 0 Duration for 1 query: 0.125 sec.
Takes a while for a table with large number of rows, but the row count is very exact.

2) SHOW TABLE STATUS or SHOW TABLE STATUS WHERE NAME="<TABLE_NAME>"

Result
Row count: 511235
Console output: Affected rows: 0 Found rows: 1 Warnings: 0 Duration for 1 query: 0.250 sec Summary: Row count is not exact.

3) SELECT * FROM information_schema.tables WHERE table_schema = DATABASE();

Result
Row count: 507806
Console output: Affected rows: 0 Found rows: 48 Warnings: 0 Duration for 1 query: 1.701 sec.
Row count is not exact.

I am not a MySQL or database expert, but I have found that for very large tables, you can use option 2 or 3 and get a 'fair idea' of how many rows are present.

I needed to get these row counts for displaying some stats on the UI. With the above queries, I knew that the total rows were more than 500,000, so I came up with showing stats like "More than 500,000 rows" without showing exact number of rows.

Maybe I have not really answered the OP's question, but I am sharing what I did in a situation where such statistics were needed. In my case, showing the approximate rows was acceptable and so the above worked for me.

Preclude answered 18/9, 2017 at 16:40 Comment(0)
R
3

In a very large table for me,

SELECT COUNT(1) FROM TableLarge 

takes 37 seconds whereas

SELECT COUNT_BIG(1) FROM TableLarge

takes 4 seconds.

Rockefeller answered 16/8, 2020 at 14:20 Comment(0)
D
2

A literally insane answer, but if you have some kind of replication system set up (for a system with a billion rows, I hope you do), you can use a rough-estimator (like MAX(pk)), divide that value by the number of slaves you have, run several queries in parallel.

For the most part, you'd partition the queries across slaves based on the best key (or the primary key I guess), in such a way (we're going to use 250000000 as our Rows / Slaves):

-- First slave
SELECT COUNT(pk) FROM t WHERE pk < 250000000
-- Ith slave where 2 <= I <= N - 1
SELECT COUNT(pk) FROM t WHERE pk >= I*250000000 and pk < (I+1)*250000000
-- Last slave
SELECT COUNT(pk) FROM t WHERE pk > (N-1)*250000000

But you need SQL only. What a bust. Ok, so let's say you're a sadomasochist. On the master (or closest slave) you'd most likely need to create a table for this:

CREATE TABLE counter_table (minpk integer, maxpk integer, cnt integer, slaveid integer)

So instead of only having the selects running in your slaves, you'd have to do an insert, akin to this:

INSERT INTO counter_table VALUES (I*25000000, (I+1)*250000000, (SELECT COUNT(pk) FROM ... ), @@SLAVE_ID)

You may run into issues with slaves writing to a table on master. You may need to get even more sadis- I mean, creative:

-- A table per slave!
INSERT INTO counter_table_slave_I VALUES (...)

You should in the end have a slave that exists last in the path traversed by the replication graph, relative to the first slave. That slave should now have all other counter values, and should have its own values. But by the time you've finished, there probably are rows added, so you'd have to insert another one compensating for the recorded max pk in your counter_table and the current max pk.

At that point, you'd have to do an aggregate function to figure out what the total rows are, but that's easier since you'd be running it on at most the "number of slaves you have and change" rows.

If you're in the situation where you have separate tables in the slaves, you can UNION to get all the rows you need.

SELECT SUM(cnt) FROM (
    SELECT * FROM counter_table_slave_1
      UNION
    SELECT * FROM counter_table_slave_2
      UNION
    ...
  )

Or you know, be a bit less insane and migrate your data to a distributed processing system, or maybe use a Data Warehousing solution (which will give you awesome data crunching in the future too).

Do note, this does depend on how well your replication is set up. Since the primary bottleneck will most likely be persistent storage, if you have cruddy storage or poorly segregated data stores with heavy neighbor noise, this will probably run you slower than just waiting for a single SELECT COUNT(*) ...

But if you have good replication, then your speed gains should be directly related to the number or slaves. In fact, if it takes 10 minutes to run the counting query alone, and you have 8 slaves, you'd cut your time to less than a couple minutes. Maybe an hour to iron out the details of this solution.

Of course, you'd never really get an amazingly accurate answer since this distributed solving introduces a bit of time where rows can be deleted and inserted, but you can try to get a distributed lock of rows at the same instance and get a precise count of the rows in the table for a particular moment in time.

Actually, this seems impossible, since you're basically stuck with an SQL-only solution, and I don't think you're provided a mechanism to run a sharded and locked query across multiple slaves, instantly. Maybe if you had control of the replication log file... which means you'd literally be spinning up slaves for this purpose, which is no doubt slower than just running the count query on a single machine anyway.

So there's my two 2013 pennies.

Dykes answered 17/9, 2013 at 8:36 Comment(0)
B
2

If insert trigger is too expensive to use, but a delete trigger could be afforded, and there is an auto-increment id, then after counting entire table once, and remembering the count as last-count and the last-counted-id,

then each day just need to count for id > last-counted-id, add that to last-count, and store the new last-counted-id.

The delete trigger would decrement last-count, if id of deleted record <= last-counted-id.

Bennett answered 2/9, 2016 at 14:50 Comment(1)
.. sorry don't have time to show the SQL that would be used (my SQL is rusty). If anyone wants to edit my answer to add SQL, that would be great!Bennett
P
1
select rows from sysindexes
where id = Object_ID('TableName') and indid <2
Propagable answered 3/11, 2011 at 9:0 Comment(0)
W
1

If you have a typical table structure with an auto-incrementing primary key column in which rows are never deleted, the following will be the fastest way to determine the record count and should work similarly across most ANSI compliant databases:

SELECT TOP(1) <primarykeyfield> FROM <table> ORDER BY <primarykeyfield> DESC;

I work with MS SQL tables containing billions of rows that require sub-second response times for data, including record counts. A similar SELECT COUNT(*) would take minutes to process by comparison.

Woozy answered 13/11, 2013 at 4:22 Comment(3)
Not entirely true - what if an INSERT transaction is rolled back? That primary key value would be absent, so the actual record count would be one less than the maximum value.Vedette
The could be gaps in sequence. Usually a result of rollbacks.Brokaw
Actually, there is a modification of this answer that might be significantly faster than count(*), if a database vendor has not sufficiently optimized count(*): Each day keep track of the last auto-index and its corresponding count, then ask for a count of records past that. Can also handle deletes if add a trigger on delete that decrements the previous total, if deleted record id <= that last auto-index.Bennett
R
1

For Sql server try this

SELECT T.name, 
       I.rows AS [ROWCOUNT] 
FROM   sys.tables AS T 
       INNER JOIN sys.sysindexes AS I 
               ON T.object_id = I.id AND I.indid < 2 
WHERE T.name = 'Your_Table_Name'
ORDER  BY I.rows DESC 
Rapp answered 22/9, 2017 at 6:31 Comment(0)
A
1

Use COUNT_BIG() for fetching the count of records in a very large sized file.

SELECT COUNT_BIG(*) FROM TABLENAME;
Augite answered 2/3, 2022 at 7:48 Comment(0)
K
0

If you are using Oracle, how about this (assuming the table stats are updated):

select <TABLE_NAME>, num_rows, last_analyzed from user_tables

last_analyzed will show the time when stats were last gathered.

Kaifeng answered 26/5, 2011 at 8:30 Comment(0)
C
0

Put an index on some column. That should allow the optimizer to perform a full scan of the index blocks, instead of a full scan of the table. That will cut your IO costs way down. Look at the execution plan before and after. Then measure wall clock time both ways.

Cranach answered 30/6, 2013 at 2:5 Comment(1)
If a table has billions of rows without an index on any column, then there will be widespread performance issues, far beyond the need expressed in original question .. but good that you mention that (assume nothing!) :)Bennett
L
0

With PostgreSQL:

SELECT reltuples AS approximate_row_count FROM pg_class WHERE relname = 'table_name'
Lockout answered 30/6, 2018 at 21:21 Comment(0)
R
0

With SQL Server 2019, you can use APPROX_COUNT_DISTINCT, which:

returns the approximate number of unique non-null values in a group

and from the docs:

APPROX_COUNT_DISTINCT is designed for use in big data scenarios and is optimized for the following conditions:

  • Access of data sets that are millions of rows or higher and
  • Aggregation of a column or columns that have many distinct values

Also, the function

  • implementation guarantees up to a 2% error rate within a 97% probability
  • requires less memory than an exhaustive COUNT DISTINCT operation
  • given the smaller memory footprint is less likely to spill memory to disk compared to a precise COUNT DISTINCT operation.

The algorithm behind the implementation its HyperLogLog.

Rutan answered 12/11, 2020 at 6:43 Comment(0)
A
-2

In SQL server 2016, I can just check table properties and then select 'Storage' tab - this gives me row count, disk space used by the table, index space used etc.

Anemophilous answered 8/5, 2019 at 19:45 Comment(1)
He was looking for a database vendor independent solution. Also this requires a GUI and can not be automated. Also it is not faster as COUNT(*)Wayne
G
-3

Maybe a bit late but this might help others for MSSQL

;WITH RecordCount AS (  SELECT      ROW_NUMBER() OVER (ORDER BY
COLUMN_NAME) AS [RowNumber]     FROM        TABLE_NAME )  SELECT
MAX(RowNumber) FROM RecordCount
Gorcock answered 30/8, 2013 at 14:46 Comment(2)
This is significantly WORSE than COUNT(), unless we're VERY lucky and the optimizer manages to optimize it to a COUNT() - why ask it to SORT on a random column?!?Salep
Replace MAX() with LIMIT 1Inenarrable

© 2022 - 2024 — McMap. All rights reserved.