SQL, Auxiliary table of numbers
Asked Answered
N

8

67

For certain types of sql queries, an auxiliary table of numbers can be very useful. It may be created as a table with as many rows as you need for a particular task or as a user defined function that returns the number of rows required in each query.

What is the optimal way to create such a function?

Nibble answered 14/8, 2008 at 9:1 Comment(5)
Could you explain why you'd do this rather than use a table pre-filled with numbers?Whatnot
To fill such a table for example.Nibble
Not all DBA's and/or 3rd party apps will allow the addition of a permanant table.Labannah
Vote for a built-in virtual numbers table feature that does not waste memory and IO at https://feedback.azure.com/forums/908035-sql-server/suggestions/32890519-add-a-built-in-table-of-numbersDirected
@LouisSomers - it is comingEhf
S
117

Heh... sorry I'm so late responding to an old post. And, yeah, I had to respond because the most popular answer (at the time, the Recursive CTE answer with the link to 14 different methods) on this thread is, ummm... performance challenged at best.

First, the article with the 14 different solutions is fine for seeing the different methods of creating a Numbers/Tally table on the fly but as pointed out in the article and in the cited thread, there's a very important quote...

"suggestions regarding efficiency and performance are often subjective. Regardless of how a query is being used, the physical implementation determines the efficiency of a query. Therefore, rather than relying on biased guidelines, it is imperative that you test the query and determine which one performs better."

Ironically, the article itself contains many subjective statements and "biased guidelines" such as "a recursive CTE can generate a number listing pretty efficiently" and "This is an efficient method of using WHILE loop from a newsgroup posting by Itzik Ben-Gen" (which I'm sure he posted just for comparison purposes). C'mon folks... Just mentioning Itzik's good name may lead some poor slob into actually using that horrible method. The author should practice what (s)he preaches and should do a little performance testing before making such ridiculously incorrect statements especially in the face of any scalablility.

With the thought of actually doing some testing before making any subjective claims about what any code does or what someone "likes", here's some code you can do your own testing with. Setup profiler for the SPID you're running the test from and check it out for yourself... just do a "Search'n'Replace" of the number 1000000 for your "favorite" number and see...

--===== Test for 1000000 rows ==================================
GO
--===== Traditional RECURSIVE CTE method
   WITH Tally (N) AS 
        ( 
         SELECT 1 UNION ALL 
         SELECT 1 + N FROM Tally WHERE N < 1000000 
        ) 
 SELECT N 
   INTO #Tally1 
   FROM Tally 
 OPTION (MAXRECURSION 0);
GO
--===== Traditional WHILE LOOP method
 CREATE TABLE #Tally2 (N INT);
    SET NOCOUNT ON;
DECLARE @Index INT;
    SET @Index = 1;
  WHILE @Index <= 1000000 
  BEGIN 
         INSERT #Tally2 (N) 
         VALUES (@Index);
            SET @Index = @Index + 1;
    END;
GO
--===== Traditional CROSS JOIN table method
 SELECT TOP (1000000)
        ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS N
   INTO #Tally3
   FROM Master.sys.All_Columns ac1
  CROSS JOIN Master.sys.ALL_Columns ac2;
GO
--===== Itzik's CROSS JOINED CTE method
   WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
        E02(N) AS (SELECT 1 FROM E00 a, E00 b),
        E04(N) AS (SELECT 1 FROM E02 a, E02 b),
        E08(N) AS (SELECT 1 FROM E04 a, E04 b),
        E16(N) AS (SELECT 1 FROM E08 a, E08 b),
        E32(N) AS (SELECT 1 FROM E16 a, E16 b),
   cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32)
 SELECT N
   INTO #Tally4
   FROM cteTally
  WHERE N <= 1000000;
GO
--===== Housekeeping
   DROP TABLE #Tally1, #Tally2, #Tally3, #Tally4;
GO

While we're at it, here's the numbers I get from SQL Profiler for the values of 100, 1000, 10000, 100000, and 1000000...

SPID TextData                                 Dur(ms) CPU   Reads   Writes
---- ---------------------------------------- ------- ----- ------- ------
  51 --===== Test for 100 rows ==============       8     0       0      0
  51 --===== Traditional RECURSIVE CTE method      16     0     868      0
  51 --===== Traditional WHILE LOOP method CR      73    16     175      2
  51 --===== Traditional CROSS JOIN table met      11     0      80      0
  51 --===== Itzik's CROSS JOINED CTE method        6     0      63      0
  51 --===== Housekeeping   DROP TABLE #Tally      35    31     401      0

  51 --===== Test for 1000 rows =============       0     0       0      0
  51 --===== Traditional RECURSIVE CTE method      47    47    8074      0
  51 --===== Traditional WHILE LOOP method CR      80    78    1085      0
  51 --===== Traditional CROSS JOIN table met       5     0      98      0
  51 --===== Itzik's CROSS JOINED CTE method        2     0      83      0
  51 --===== Housekeeping   DROP TABLE #Tally       6    15     426      0

  51 --===== Test for 10000 rows ============       0     0       0      0
  51 --===== Traditional RECURSIVE CTE method     434   344   80230     10
  51 --===== Traditional WHILE LOOP method CR     671   563   10240      9
  51 --===== Traditional CROSS JOIN table met      25    31     302     15
  51 --===== Itzik's CROSS JOINED CTE method       24     0     192     15
  51 --===== Housekeeping   DROP TABLE #Tally       7    15     531      0

  51 --===== Test for 100000 rows ===========       0     0       0      0
  51 --===== Traditional RECURSIVE CTE method    4143  3813  800260    154
  51 --===== Traditional WHILE LOOP method CR    5820  5547  101380    161
  51 --===== Traditional CROSS JOIN table met     160   140     479    211
  51 --===== Itzik's CROSS JOINED CTE method      153   141     276    204
  51 --===== Housekeeping   DROP TABLE #Tally      10    15     761      0

  51 --===== Test for 1000000 rows ==========       0     0       0      0
  51 --===== Traditional RECURSIVE CTE method   41349 37437 8001048   1601
  51 --===== Traditional WHILE LOOP method CR   59138 56141 1012785   1682
  51 --===== Traditional CROSS JOIN table met    1224  1219    2429   2101
  51 --===== Itzik's CROSS JOINED CTE method     1448  1328    1217   2095
  51 --===== Housekeeping   DROP TABLE #Tally       8     0     415      0

As you can see, the Recursive CTE method is the second worst only to the While Loop for Duration and CPU and has 8 times the memory pressure in the form of logical reads than the While Loop. It's RBAR on steroids and should be avoided, at all cost, for any single row calculations just as a While Loop should be avoided. There are places where recursion is quite valuable but this ISN'T one of them.

As a side bar, Mr. Denny is absolutely spot on... a correctly sized permanent Numbers or Tally table is the way to go for most things. What does correctly sized mean? Well, most people use a Tally table to generate dates or to do splits on VARCHAR(8000). If you create an 11,000 row Tally table with the correct clustered index on "N", you'll have enough rows to create more than 30 years worth of dates (I work with mortgages a fair bit so 30 years is a key number for me) and certainly enough to handle a VARCHAR(8000) split. Why is "right sizing" so important? If the Tally table is used a lot, it easily fits in cache which makes it blazingly fast without much pressure on memory at all.

Last but not least, every one knows that if you create a permanent Tally table, it doesn't much matter which method you use to build it because 1) it's only going to be made once and 2) if it's something like an 11,000 row table, all of the methods are going to run "good enough". So why all the indigination on my part about which method to use???

The answer is that some poor guy/gal who doesn't know any better and just needs to get his or her job done might see something like the Recursive CTE method and decide to use it for something much larger and much more frequently used than building a permanent Tally table and I'm trying to protect those people, the servers their code runs on, and the company that owns the data on those servers. Yeah... it's that big a deal. It should be for everyone else, as well. Teach the right way to do things instead of "good enough". Do some testing before posting or using something from a post or book... the life you save may, in fact, be your own especially if you think a recursive CTE is the way to go for something like this. ;-)

Thanks for listening...

Socinian answered 18/4, 2010 at 17:44 Comment(4)
I really really wish more people have your sense of social responsability. Have said that and apart one'd need once to populate a Numbers table for all kind of stuff, if need for some reason, it seems SELECT INTO w/ IDENTITY is faster than CTE.Intuitional
Thank you for the very kind feedback, Andre.Socinian
You don't know that CTEs are dreadfully slow, that they are rendered using temp tables ? And that this requirement can be fulfilled without CTE ? Of course, without that knowledge, CTEs are marvellous.Niggling
Actually, I do know that they're dreadfully slow. I wrote an article about it a long time ago. Please feel free to read it. sqlservercentral.com/articles/… Now, that was back in the day of less modern machines and, while newer machines have gotten faster, rCTE still use 8X more reads than a WHILE loop and a WHILE loop in a transaction will still beat if for speed. Like I said, "dreadfully" slow and hardly ever "marvelous". Learn the right way to do such things.Socinian
H
12

The most optimal function would be to use a table instead of a function. Using a function causes extra CPU load to create the values for the data being returned, especially if the values being returned cover a very large range.

Hartwell answered 2/9, 2008 at 9:48 Comment(4)
I think it depends then on your situation. Between the two best performing options, you can trade between IO and CPU costs, depending on what is more expensive for you.Hayward
IO will almost always be cheaper than CPU, especially as this table would be small and probably already in budferpool.Hartwell
@Hartwell I/O is always way more expensive and slower than CPU. SSDs have changed this somewhat in recent years, but in most production architectures those SSDs have a network link between them and the CPUs. The only databases I see that are truly CPU bound are running untuned ORM-only apps or heavy machine learning.Shirleneshirley
@Shirleneshirley except if the table is used often enough for us to care, it will almost certainly be in memory, and memory is cheaper to upgrade and usually doesn't impact licensing the way adding CPU cores can. SQL Server Enterprise edition is going to be in the ball park of a 5 digit number PER CORE, i.e. adding cores will probably cost you more in licensing alone than the entire cost out the door of throwing more ram in the server.Greenshank
B
5

This article gives 14 different possible solutions with discussion of each. The important point is that:

suggestions regarding efficiency and performance are often subjective. Regardless of how a query is being used, the physical implementation determines the efficiency of a query. Therefore, rather than relying on biased guidelines, it is imperative that you test the query and determine which one performs better.

I personally liked:

WITH Nbrs ( n ) AS (
    SELECT 1 UNION ALL
    SELECT 1 + n FROM Nbrs WHERE n < 500 )
SELECT n FROM Nbrs
OPTION ( MAXRECURSION 500 )
Blanketyblank answered 25/9, 2009 at 19:50 Comment(1)
Proven wrong by the accepted answer? It is not 'optimal', though it looks handsome.Hayward
K
4

This view is super fast and contains all positive int values.

CREATE VIEW dbo.Numbers
WITH SCHEMABINDING
AS
    WITH Int1(z) AS (SELECT 0 UNION ALL SELECT 0)
    , Int2(z) AS (SELECT 0 FROM Int1 a CROSS JOIN Int1 b)
    , Int4(z) AS (SELECT 0 FROM Int2 a CROSS JOIN Int2 b)
    , Int8(z) AS (SELECT 0 FROM Int4 a CROSS JOIN Int4 b)
    , Int16(z) AS (SELECT 0 FROM Int8 a CROSS JOIN Int8 b)
    , Int32(z) AS (SELECT TOP 2147483647 0 FROM Int16 a CROSS JOIN Int16 b)
    SELECT ROW_NUMBER() OVER (ORDER BY z) AS n
    FROM Int32
GO
Kevenkeverian answered 4/7, 2011 at 12:24 Comment(4)
0 is often useful. And I would probably convert the final column to int. Also you should know that basically the method is included in the accepted answer (without 0 or conversion to int either) by the name of Itzik's CROSS JOINED CTE method.Lem
Any particular reason to add WITH SCHEMABINDING in the view?Coercive
Adding 'WITH SCHEMABINDING' can make queries faster. It helps the optimizer know that no data is accessed. (See blogs.msdn.com/b/sqlprogrammability/archive/2006/05/12/…)Kevenkeverian
I wonder if @AnthonyFaull can back this up with some measurements.Hayward
E
3

From SQL Server 2022 you will be able to do

SELECT Value
FROM GENERATE_SERIES(START = 1, STOP = 100, STEP=1)

In the public preview of SQL Server 2022 (CTP2.0) there are some very promising elements and other less so. Hopefully the negative aspects can be addressed before the actual release.

Execution time for number generation The below generates 10,000,000 numbers in 700 ms in my test VM (the assigning to a variable removes any overhead from sending results to the client)

DECLARE @Value INT 

SELECT @Value =[value]
FROM GENERATE_SERIES(START=1, STOP=10000000)

Cardinality estimates

It is simple to calculate how many numbers will be returned from the operator and SQL Server takes advantage of this as shown below.

enter image description here

Unnecessary Halloween Protection

The plan for the below insert has a completely unnecessary spool - presumably as SQL Server does not currently have logic to determine the source of the rows is not potentially the destination.

CREATE TABLE dbo.NumberHeap(Number INT);

INSERT INTO dbo.Numbers
SELECT [value]
FROM GENERATE_SERIES(START=1, STOP=10);

When inserting into a table with a clustered index on Number the spool may be replaced by a sort instead (that also provides the phase separation)

enter image description here

Unnecessary sorts

The below will return the rows in order anyway but SQL Server apparently does not yet have the properties set to guarantee this and take advantage of it in the execution plan.

SELECT [value]
FROM GENERATE_SERIES(START=1, STOP=10)
ORDER BY [value] 

enter image description here

RE: This last point Aaron Bertrand indicates that this is not a box currently ticked but that this may be forthcoming.

Ehf answered 10/3, 2022 at 15:47 Comment(0)
U
1

Using SQL Server 2016+ to generate numbers table you could use OPENJSON :

-- range from 0 to @max - 1
DECLARE @max INT = 40000;

SELECT rn = CAST([key] AS INT) 
FROM OPENJSON(CONCAT('[1', REPLICATE(CAST(',1' AS VARCHAR(MAX)),@max-1),']'));

LiveDemo


Idea taken from How can we use OPENJSON to generate series of numbers?
Upholsterer answered 2/5, 2016 at 16:29 Comment(2)
Nice. I guess, one could have used XML similarly to this if position() had been fully supported in SQL Server's XQuery.Lem
Sorry for the late comment but that code uses 11.4 times more CPU and infinitely more logical reads (2,000,023) than Itik's cascading CTE method.Socinian
A
0

edit: see Conrad's comment below.

Jeff Moden's answer is great ... but I find on Postgres that the Itzik method fails unless you remove the E32 row.

Slightly faster on postgres (40ms vs 100ms) is another method I found on here adapted for postgres:

WITH 
    E00 (N) AS ( 
        SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
        SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ),
    E01 (N) AS (SELECT a.N FROM E00 a CROSS JOIN E00 b),
    E02 (N) AS (SELECT a.N FROM E01 a CROSS JOIN E01 b ),
    E03 (N) AS (SELECT a.N FROM E02 a CROSS JOIN E02 b 
        LIMIT 11000  -- end record  11,000 good for 30 yrs dates
    ), -- max is 100,000,000, starts slowing e.g. 1 million 1.5 secs, 2 mil 2.5 secs, 3 mill 4 secs
    Tally (N) as (SELECT row_number() OVER (ORDER BY a.N) FROM E03 a)

SELECT N
FROM Tally

As I am moving from SQL Server to Postgres world, may have missed a better way to do tally tables on that platform ... INTEGER()? SEQUENCE()?

Apostolic answered 20/1, 2011 at 9:28 Comment(6)
may have missed a better way to do tally tables on [postgres] Yeah you did generate_seriesBouldon
@Conrad Frix , Apologies for the very late question (more than 5 years late) but have you done any performance testing to compare that great built in tool with other methods?Socinian
@JeffModen Sorry no, but it's easy to test. Take Ruskin's query and compare it to call to generate series.Bouldon
@Conrad Frix , since you made the claim of performance and you have access to both environments (which I don't) and you do also claim it's easy to test, I was hoping you'd take the time to test it. ;-)Socinian
@JeffModen postgresql is free and you seem to be quite a capable perf tester. It's also not clear to me why you are asking me I just wrote a link to a doc in a comment. ;). Also not surprisingly generate series is faster sqlfiddle.com/#!17/9eecb/20843 but this is hardly the in depth testing you've doneBouldon
@Conrad Frix, Heh... you already have it setup and you can't take 5 minutes to test your own claim of performance. NP. Moving on,Socinian
S
0

Still much later, I'd like to contribute a slightly different 'traditional' CTE (does not touch base tables to get the volume of rows):

--===== Hans CROSS JOINED CTE method
WITH Numbers_CTE (Digit)
AS
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9)
SELECT HundredThousand.Digit * 100000 + TenThousand.Digit * 10000 + Thousand.Digit * 1000 + Hundred.Digit * 100 + Ten.Digit * 10 + One.Digit AS Number
INTO #Tally5
FROM Numbers_CTE AS One CROSS JOIN Numbers_CTE AS Ten CROSS JOIN Numbers_CTE AS Hundred CROSS JOIN Numbers_CTE AS Thousand CROSS JOIN Numbers_CTE AS TenThousand CROSS JOIN Numbers_CTE AS HundredThousand

This CTE performs more READs then Itzik's CTE but less then the Traditional CTE. However, it consistently performs less WRITES then the other queries. As you know, Writes are consistently quite much more expensive then Reads.

The duration depends heavily on the number of cores (MAXDOP) but, on my 8core, performs consistently quicker (less duration in ms) then the other queries.

I am using:

Microsoft SQL Server 2012 - 11.0.5058.0 (X64) 
May 14 2014 18:34:29 
Copyright (c) Microsoft Corporation
Enterprise Edition (64-bit) on Windows NT 6.3 <X64> (Build 9600: )

on Windows Server 2012 R2, 32 GB, Xeon X3450 @2.67Ghz, 4 cores HT enabled.

Suspension answered 22/10, 2014 at 10:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.