What is the best way to create and populate a numbers table?
Asked Answered
D

12

86

I've seen many different ways to create and populate a numbers table. However, what is the best way to create and populate one? With "best" being defined from most to least important:

  • Table created with optimal indexing
  • rows generated fastest
  • simple code used to create and populate

If you don't know what a numbers table is, look here: Why should I consider using an auxiliary numbers table?

Duotone answered 8/9, 2009 at 13:12 Comment(6)
turns out this was more-or-less a duplicate of #11319, as far as I can tellYsabel
The best way by far is voting for a built in implementation of a virtual table that does not need to be physically allocated. This can currently be done here: https://feedback.azure.com/forums/908035-sql-server/suggestions/32890519-add-a-built-in-table-of-numbersPrimo
@LouisSomers, I like that approach. however, If you need your program to work long before they get around to adding that feature, you will need to resort to building your own.Duotone
@KM Haha, that's true, I'm in the same boat, I'm just trying to raise more votes for a feature that I think is more important than a dark theme in management studio....Primo
@LouisSomers, tried to follow your link to add my vote but those bums at M$ are moving away from that platform.Aestivation
Coming in SQL Server 2022Marcus
D
152

Here are some code examples taken from the web and from answers to this question.

For each method, I have modified the original code so each uses the same table and column: NumbersTest and Number, with 10,000 rows or as close to that as possible. Also, I have provided links to the place of origin.

METHOD 1 here is a very slow looping method from here
avg 13.01 seconds
ran 3 times removed highest, here are times in seconds: 12.42, 13.60

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest(Number INT IDENTITY(1,1)) 
SET NOCOUNT ON
WHILE COALESCE(SCOPE_IDENTITY(), 0) < 100000
BEGIN 
    INSERT dbo.NumbersTest DEFAULT VALUES 
END
SET NOCOUNT OFF
-- Add a primary key/clustered index to the numbers table
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE())/1000.0)+' seconds'
SELECT COUNT(*) FROM NumbersTest

METHOD 2 here is a much faster looping one from here
avg 1.1658 seconds
ran 11 times removed highest, here are times in seconds: 1.117, 1.140, 1.203, 1.170, 1.173, 1.156, 1.203, 1.153, 1.173, 1.170

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number INT NOT NULL);
DECLARE @i INT;
SELECT @i = 1;
SET NOCOUNT ON
WHILE @i <= 10000
BEGIN
    INSERT INTO dbo.NumbersTest(Number) VALUES (@i);
    SELECT @i = @i + 1;
END;
SET NOCOUNT OFF
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE())/1000.0)+' seconds'
SELECT COUNT(*) FROM NumbersTest

METHOD 3 Here is a single INSERT based on code from here
avg 488.6 milliseconds
ran 11 times removed highest, here are times in milliseconds: 686, 673, 623, 686,343,343,376,360,343,453

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number  int  not null)  
;WITH Nums(Number) AS
(SELECT 1 AS Number
 UNION ALL
 SELECT Number+1 FROM Nums where Number<10000
)
insert into NumbersTest(Number)
    select Number from Nums option(maxrecursion 10000)
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest

METHOD 4 here is a "semi-looping" method from here avg 348.3 milliseconds (it was hard to get good timing because of the "GO" in the middle of the code, any suggestions would be appreciated)
ran 11 times removed highest, here are times in milliseconds: 356, 360, 283, 346, 360, 376, 326, 373, 330, 373

DROP TABLE NumbersTest
DROP TABLE #RunDate
CREATE TABLE #RunDate (RunDate datetime)
INSERT INTO #RunDate VALUES(GETDATE())
CREATE TABLE NumbersTest (Number int NOT NULL);
INSERT NumbersTest values (1);
GO --required
INSERT NumbersTest SELECT Number + (SELECT COUNT(*) FROM NumbersTest) FROM NumbersTest
GO 14 --will create 16384 total rows
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
SELECT CONVERT(varchar(20),datediff(ms,RunDate,GETDATE()))+' milliseconds' FROM #RunDate
SELECT COUNT(*) FROM NumbersTest

METHOD 5 here is a single INSERT from Philip Kelley's answer
avg 92.7 milliseconds
ran 11 times removed highest, here are times in milliseconds: 80, 96, 96, 93, 110, 110, 80, 76, 93, 93

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number  int  not null)  
;WITH
  Pass0 as (select 1 as C union all select 1), --2 rows
  Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows
  Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows
  Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows
  Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows
  --I removed Pass5, since I'm only populating the Numbers table to 10,000
  Tally as (select row_number() over(order by C) as Number from Pass4)
INSERT NumbersTest
        (Number)
    SELECT Number
        FROM Tally
        WHERE Number <= 10000
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest

METHOD 6 here is a single INSERT from Mladen Prajdic answer
avg 82.3 milliseconds
ran 11 times removed highest, here are times in milliseconds: 80, 80, 93, 76, 93, 63, 93, 76, 93, 76

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number int not null)  
INSERT INTO NumbersTest(Number)
SELECT TOP 10000 row_number() over (order by t1.number) as N
FROM master..spt_values t1 
    CROSS JOIN master..spt_values t2
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number);
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest

METHOD 7 here is a single INSERT based on the code from here
avg 56.3 milliseconds
ran 11 times removed highest, here are times in milliseconds: 63, 50, 63, 46, 60, 63, 63, 46, 63, 46

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
    INTO NumbersTest
    FROM sys.objects s1       --use sys.columns if you don't get enough rows returned to generate all the numbers you need
    CROSS JOIN sys.objects s2 --use sys.columns if you don't get enough rows returned to generate all the numbers you need
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest

After looking at all these methods, I really like Method 7, which was the fastest and the code is fairly simple too.

Duotone answered 10/9, 2009 at 20:17 Comment(11)
Saw this post years later. I'd be interested in timing for 1 million rows or more. I may try that some day, but 10000 is probably as many as would ever reasonably be needed.Arenaceous
While interesting, the timing doesn't seem that important to me. Specifically because if I ever have need of a numbers table, I'm going to create it once and use it over and over.Fariss
Many thanks! I know this is old, but for those landing here I'd suggest creating a Numbers table of 100,000 so that you can use it in combination with dates.Halfprice
Method 7 created a table with 9604 rows in it.Bridgeman
The options that rely on sys.objects will produce essentially random results for large requests from small databases. It's only useful for very large databases and relatively small number requests!Precess
@Dave, method 7 won't produce random results, possibly fewer rows than needed. If you need more rows, add additional CROSS JOIN sys.objects s3 as necessary.Duotone
Essentially random to people visiting this page who have different database objects (as seen in comments). I'd suggest updating to use sys.columns which is more likely to provide the needed numbers for most requests.Precess
@Dave, HA, one comment said they 9604 rows four years after answering it! Six years after answering it you say it gives random results. Random results implies that that you get random values. You will always get continuous integer values starting at 1, possibly fewer than the 10,000, if you have few rows in sys.objects. I tried method 7 on a fresh database (76 rows in sys.objects) and it can create 5,776 rows (76*76). If you add the CROSS JOIN sys.objects s3 as suggested in the previous comment, you will get 438,976 rows (76*76*76).Duotone
@KM Hi I used Method 7 and wanted to generate 999999 numbers, however the MAX number it generates up until is 91809, why is this? Cheers.Nichols
@Lynchie, add another line in the query: CROSS JOIN sys.objects s3 you didn't have enough rows in sys.objectsDuotone
@Duotone just to add my 2 cents, Itzik number table has an edge not in terms of the total time elapsed but it doesn't depend on any system tables and no physical reads at all, everything happens in memory. Also, he made an enhancement by adding a TOP operator here. Hope this will help others in futureMew
W
67

I use this which is extremely fast:

insert into Numbers(N)
select top 1000000 row_number() over (order by t1.number) as N
from   master..spt_values t1 
       cross join master..spt_values t2
Windblown answered 8/9, 2009 at 13:39 Comment(1)
Just a note to say this isn't supported in Azure SQL Database.Rolon
A
21

If you're just doing this in either SQL Server Management Studio or sqlcmd.exe, you can use the fact that the batch separator allows you to repeat the batch:

CREATE TABLE Number (N INT IDENTITY(1,1) PRIMARY KEY NOT NULL);
GO

INSERT INTO Number DEFAULT VALUES;
GO 100000

This will insert 100000 records into the Numbers table using the default value of the next identity.

It's slow. It compares to METHOD 1 in @KM.'s answer, which is the slowest of the examples. However, it's about as code light as it gets. You could speed it up somewhat by adding the primary key constraint after the insert batch.

Amelioration answered 12/1, 2015 at 19:34 Comment(3)
@Bacon Bits, can I insert only (a) specific column(s)?Selfsown
@Selfsown You can use this method as long as you can write a single INSERT statement which, when executed repeatedly, will create the data for every row. All batch repeater does is tell the client (either SSMS or sqlcmd.exe) to repeat the exact same query N times. You can leverage T-SQL do that any number of ways, but I suspect it would quickly get to be anything but code light.Amelioration
short code, simple, runs on any database. Love it!Folketing
A
15

I start with the following template, which is derived from numerous printings of Itzik Ben-Gan's routine:

;WITH
  Pass0 as (select 1 as C union all select 1), --2 rows
  Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows
  Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows
  Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows
  Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows
  Pass5 as (select 1 as C from Pass4 as A, Pass4 as B),--4,294,967,296 rows
  Tally as (select row_number() over(order by C) as Number from Pass5)
 select Number from Tally where Number <= 1000000

The "WHERE N<= 1000000" clause limits the output to 1 to 1 million, and can easily be adjusted to your desired range.

Since this is a WITH clause, it can be worked into an INSERT... SELECT... like so:

--  Sample use: create one million rows
CREATE TABLE dbo.Example (ExampleId  int  not null)  

DECLARE @RowsToCreate int
SET @RowsToCreate = 1000000

--  "Table of numbers" data generator, as per Itzik Ben-Gan (from multiple sources)
;WITH
  Pass0 as (select 1 as C union all select 1), --2 rows
  Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows
  Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows
  Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows
  Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows
  Pass5 as (select 1 as C from Pass4 as A, Pass4 as B),--4,294,967,296 rows
  Tally as (select row_number() over(order by C) as Number from Pass5)
INSERT Example (ExampleId)
 select Number
  from Tally
  where Number <= @RowsToCreate

Indexing the table after it's built will be the fastest way to index it.

Oh, and I'd refer to it as a "Tally" table. I think this is a common term, and you can find loads of tricks and examples by Googling it.

Arenaceous answered 8/9, 2009 at 14:7 Comment(1)
This is the best answer in my opinion as it doesn't include any system tables and can be used on Azure SQL Database in 2023 despite the answers being so old I probably had colour in my hair when it was written.Gasify
A
6

Here is a short and fast in-memory solution that I came up with utilizing the Table Valued Constructors introduced in SQL Server 2008:

It will return 1,000,000 rows, however you can either add/remove CROSS JOINs, or use TOP clause to modify this.

;WITH v AS (SELECT * FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) v(z))

SELECT N FROM (SELECT ROW_NUMBER() OVER (ORDER BY v1.z)-1 N FROM v v1 
    CROSS JOIN v v2 CROSS JOIN v v3 CROSS JOIN v v4 CROSS JOIN v v5 CROSS JOIN v v6) Nums

Note that this could be quickly calculated on the fly, or (even better) stored in a permanent table (just add an INTO clause after the SELECT N segment) with a primary key on the N field for improved efficiency.

Articulator answered 9/10, 2016 at 1:7 Comment(3)
I like this idea if you want an on the fly numbers table. It is slower than others when you use it to generate an actual table.Duotone
@Duotone I just tested it out on my setup, it took less than a second. But let's say hypothetically that it took 10 seconds, vs another that only took 1 second (to set up the permanent table). IMO, this is still minuscule considering, you will only ever have to set up the permanent table once. Other factors, like code verbosity, would be more important to me. 1 minute vs 1 second? that would be a little different, but my query isn't that slow.Articulator
This is the way I have always done it. It's not the fastest but it is readable in that each new cross join adds another 0, 10, 100,1000, 10000 rows etc.Gasify
S
5

For anyone looking for an Azure solution

SET NOCOUNT ON;
CREATE TABLE Numbers (n bigint PRIMARY KEY);
GO    

DECLARE @numbers table(number int);  

WITH numbers(number) as  (   
    SELECT 1 AS number   
    UNION all   
    SELECT number+1 FROM numbers WHERE number<10000  
)  
INSERT INTO @numbers(number)  
SELECT number FROM numbers OPTION (maxrecursion 10000);

INSERT INTO Numbers(n) SELECT number FROM @numbers;

Sourced from the sql azure team blog http://azure.microsoft.com/blog/2010/09/16/create-a-numbers-table-in-sql-azure/

Selfcommand answered 12/7, 2014 at 9:43 Comment(0)
M
3

Here is a couple of extra methods:
Method 1

IF OBJECT_ID('dbo.Numbers', 'U') IS NOT NULL
    DROP TABLE dbo.Numbers
GO

CREATE TABLE Numbers (Number int NOT NULL PRIMARY KEY);
GO

DECLARE @i int = 1;
INSERT INTO dbo.Numbers (Number) 
VALUES (1),(2);

WHILE 2*@i < 1048576
BEGIN
    INSERT INTO dbo.Numbers (Number) 
    SELECT Number + 2*@i
    FROM dbo.Numbers;
    SET @i = @@ROWCOUNT;
END
GO

SELECT COUNT(*) FROM Numbers AS RowCownt --1048576 rows

Method 2

IF OBJECT_ID('dbo.Numbers', 'U') IS NOT NULL
    DROP TABLE dbo.Numbers
GO

CREATE TABLE dbo.Numbers (Number int NOT NULL PRIMARY KEY);
GO

DECLARE @i INT = 0; 
INSERT INTO dbo.Numbers (Number) 
VALUES (1);

WHILE @i <= 9
BEGIN
    INSERT INTO dbo.Numbers (Number)
    SELECT N.Number + POWER(4, @i) * D.Digit 
    FROM dbo.Numbers AS N
        CROSS JOIN (VALUES(1),(2),(3)) AS D(Digit)
    ORDER BY D.Digit, N.Number
    SET @i = @i + 1;
END
GO

SELECT COUNT(*) FROM dbo.Numbers AS RowCownt --1048576 rows

Method 3

IF OBJECT_ID('dbo.Numbers', 'U') IS NOT NULL
    DROP TABLE dbo.Numbers
GO

CREATE TABLE Numbers (Number int identity NOT NULL PRIMARY KEY, T bit NULL);

WITH
    T1(T) AS (SELECT T FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) AS T(T)) --10 rows
   ,T2(T) AS (SELECT A.T FROM T1 AS A CROSS JOIN T1 AS B CROSS JOIN T1 AS C) --1,000 rows
   ,T3(T) AS (SELECT A.T FROM T2 AS A CROSS JOIN T2 AS B CROSS JOIN T2 AS C) --1,000,000,000 rows

INSERT INTO dbo.Numbers(T)
SELECT TOP (1048576) NULL
FROM T3;

ALTER TABLE Numbers
    DROP COLUMN T; 
GO

SELECT COUNT(*) FROM dbo.Numbers AS RowCownt --1048576 rows

Method 4, taken from Defensive Database Programming book by Alex Kuznetsov

IF OBJECT_ID('dbo.Numbers', 'U') IS NOT NULL
    DROP TABLE dbo.Numbers
GO

CREATE TABLE Numbers (Number int NOT NULL PRIMARY KEY);
GO

DECLARE @i INT = 1 ; 
INSERT INTO dbo.Numbers (Number) 
VALUES (1);

WHILE @i < 524289 --1048576
BEGIN; 
    INSERT INTO dbo.Numbers (Number) 
    SELECT Number + @i 
    FROM dbo.Numbers; 
    SET @i = @i * 2 ; 
END
GO

SELECT COUNT(*) FROM dbo.Numbers AS RowCownt --1048576 rows

Method 5, taken from Arrays and Lists in SQL Server 2005 and Beyond article by Erland Sommarskog

IF OBJECT_ID('dbo.Numbers', 'U') IS NOT NULL
    DROP TABLE dbo.Numbers
GO

CREATE TABLE Numbers (Number int NOT NULL PRIMARY KEY);
GO

WITH digits (d) AS (
   SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL
   SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL
   SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL
   SELECT 0)
INSERT INTO Numbers (Number)
   SELECT Number
   FROM   (SELECT i.d + ii.d * 10 + iii.d * 100 + iv.d * 1000 +
                  v.d * 10000 + vi.d * 100000 AS Number
           FROM   digits i
           CROSS  JOIN digits ii
           CROSS  JOIN digits iii
           CROSS  JOIN digits iv
           CROSS  JOIN digits v
           CROSS  JOIN digits vi) AS Numbers
   WHERE  Number > 0
GO

SELECT COUNT(*) FROM dbo.Numbers AS RowCownt --999999 rows

Summary:
Among those 5 methods, method 3 seems to be the fastest.

Manwell answered 7/3, 2015 at 9:45 Comment(0)
S
3

I know this thread is old and answered, but there is a way to squeeze a little extra performance out of Method 7:

Instead of this (essentially method 7 but with some ease of use polish):

DECLARE @BIT AS BIT = 0
IF OBJECT_ID('tempdb..#TALLY') IS NOT NULL
  DROP TABLE #TALLY
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
    INTO #TALLY
    FROM sys.objects s1       --use sys.columns if you don't get enough rows returned to generate all the numbers you need
    CROSS JOIN sys.objects s2 --use sys.co
ALTER TABLE #TALLY ADD PRIMARY KEY(Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'

Try this:

DECLARE @BIT AS BIT = 0
IF OBJECT_ID('tempdb..#TALLY') IS NOT NULL
  DROP TABLE #TALLY
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
    INTO #TALLY
    FROM        (SELECT @BIT [X] UNION ALL SELECT @BIT) [T2]
    CROSS JOIN  (SELECT @BIT [X] UNION ALL SELECT @BIT) [T4]
    CROSS JOIN  (SELECT @BIT [X] UNION ALL SELECT @BIT) [T8]
    CROSS JOIN  (SELECT @BIT [X] UNION ALL SELECT @BIT) [T16]
    CROSS JOIN  (SELECT @BIT [X] UNION ALL SELECT @BIT) [T32]
    CROSS JOIN  (SELECT @BIT [X] UNION ALL SELECT @BIT) [T64]
    CROSS JOIN  (SELECT @BIT [X] UNION ALL SELECT @BIT) [T128]
    CROSS JOIN  (SELECT @BIT [X] UNION ALL SELECT @BIT) [T256]
    CROSS JOIN  (SELECT @BIT [X] UNION ALL SELECT @BIT) [T512]
    CROSS JOIN  (SELECT @BIT [X] UNION ALL SELECT @BIT) [T1024]
    CROSS JOIN  (SELECT @BIT [X] UNION ALL SELECT @BIT) [T2048]
    CROSS JOIN  (SELECT @BIT [X] UNION ALL SELECT @BIT) [T4096]
    CROSS JOIN  (SELECT @BIT [X] UNION ALL SELECT @BIT) [T8192]
    CROSS JOIN  (SELECT @BIT [X] UNION ALL SELECT @BIT) [T16384]
ALTER TABLE #TALLY ADD PRIMARY KEY(Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'

On my server this takes ~10 ms as opposed to the ~16-20 ms when selecting from sys.objects. It also has the added benefit of not being dependent on how many objects are in sys.objects. While it's pretty safe, it's technically a dependency and the other one goes faster anyway. I think the speed boost is down to using BITs if you change:

DECLARE @BIT AS BIT = 0

to:

DECLARE @BIT AS BIGINT = 0

It adds ~8-10 ms to the total time on my server. That said, when you scale up to 1,000,000 records BIT vs BIGINT doesn't appreciably affect my query anymore, but it still runs around ~680ms vs ~730ms from sys.objects.

Stagner answered 8/12, 2017 at 21:12 Comment(0)
Z
2

I use numbers tables for primarily dummying up reports in BIRT without having to fiddle around with dynamic creation of recordsets.

I do the same with dates, having a table spanning from 10 years in the past to 10 years in the future (and hours of the day for more detailed reporting). It's a neat trick to be able to get values for all dates even if your 'real' data tables don't have data for them.

I have a script which I use to create these, something like (this is from memory):

drop table numbers; commit;
create table numbers (n integer primary key); commit;
insert into numbers values (0); commit;
insert into numbers select n+1 from numbers; commit;
insert into numbers select n+2 from numbers; commit;
insert into numbers select n+4 from numbers; commit;
insert into numbers select n+8 from numbers; commit;
insert into numbers select n+16 from numbers; commit;
insert into numbers select n+32 from numbers; commit;
insert into numbers select n+64 from numbers; commit;

The number of rows doubles with each line so it doesn't take a lot to produce truly huge tables.

I'm not sure I agree with you that it's important to be created fast since you only create it once. The cost of that is amortized over all the accesses to it, rendering that time fairly insignificant.

Zonnya answered 8/9, 2009 at 13:35 Comment(2)
each commits; results in Msg 3902, Level 16, State 1, Line 1 The COMMIT TRANSACTION request has no corresponding BEGIN TRANSACTION.Duotone
@KM, the first point is easily fixed by beginning a transaction (DB/2, my DBMS of choice, is usually configured to autostart transactions). And, if you want more rows, you just have to add more inserts. Each one doubles the range so it's very easy to get up to large numbers if you want. I also prefer to give generic SQL solutions where possible rather than limit solutions to specific vendors.Zonnya
H
1

Some of the suggested methods are basing on system objects (for example on the 'sys.objects'). They are assuming these system objects contain enough records to generate our numbers.

I would not base on anything which does not belong to my application and over which I do not have full control. For example: the content of these sys tables may change, the tables may not be valid anymore in new version of SQL etc.

As a solution, we can create our own table with records. We then use that one instead these system related objects (table with all numbers should be fine if we know the range in advance otherwise we could go for the one to do the cross join on).

The CTE based solution is working fine but it has limits related to the nested loops.

Horbal answered 20/7, 2012 at 19:7 Comment(0)
A
0

This is a repackaging of the accepted answer - but in a way that lets you compare them all to each other for yourself - the top 3 algorithms are compared (and comments explain why other methods are excluded) and you can run against your own setup to see how they each perform with the size of sequence that you desire.

SET NOCOUNT ON;

--
-- Set the count of numbers that you want in your sequence ...
--
DECLARE @NumberOfNumbers int = 10000000;
--
--  Some notes on choosing a useful length for your sequence ...
--      For a sequence of  100 numbers -- winner depends on preference of min/max/avg runtime ... (I prefer PhilKelley algo here - edit the algo so RowSet2 is max RowSet CTE)
--      For a sequence of   1k numbers -- winner depends on preference of min/max/avg runtime ... (Sadly PhilKelley algo is generally lowest ranked in this bucket, but could be tweaked to perform better)
--      For a sequence of  10k numbers -- a clear winner emerges for this bucket
--      For a sequence of 100k numbers -- do not test any looping methods at this size or above ...
--                                        the previous winner fails, a different method is need to guarantee the full sequence desired
--      For a sequence of  1MM numbers -- the statistics aren't changing much between the algorithms - choose one based on your own goals or tweaks
--      For a sequence of 10MM numbers -- only one of the methods yields the desired sequence, and the numbers are much closer than for smaller sequences

DECLARE @TestIteration int = 0;
DECLARE @MaxIterations int = 10;
DECLARE @MethodName varchar(128);

-- SQL SERVER 2017 Syntax/Support needed
DROP TABLE IF EXISTS #TimingTest
CREATE TABLE #TimingTest (MethodName varchar(128), TestIteration int, StartDate DateTime2, EndDate DateTime2, ElapsedTime decimal(38,0), ItemCount decimal(38,0), MaxNumber decimal(38,0), MinNumber decimal(38,0))

--
--  Conduct the test ...
--
WHILE @TestIteration < @MaxIterations
BEGIN
    -- Be sure that the test moves forward
    SET @TestIteration += 1;

/*  -- This method has been removed, as it is BY FAR, the slowest method
    -- This test shows that, looping should be avoided, likely at all costs, if one places a value / premium on speed of execution ...

    --
    -- METHOD - Fast looping
    --

    -- Prep for the test
    DROP TABLE IF EXISTS [Numbers].[Test];
    CREATE TABLE [Numbers].[Test] (Number INT NOT NULL);

    -- Method information
    SET @MethodName = 'FastLoop';

    -- Record the start of the test
    INSERT INTO #TimingTest(MethodName, TestIteration, StartDate)
    SELECT @MethodName, @TestIteration, GETDATE()

    -- Run the algorithm
    DECLARE @i INT = 1;
    WHILE @i <= @NumberOfNumbers
    BEGIN
        INSERT INTO [Numbers].[Test](Number) VALUES (@i);
        SELECT @i = @i + 1;
    END;

    ALTER TABLE [Numbers].[Test] ADD CONSTRAINT PK_Numbers_Test_Number PRIMARY KEY CLUSTERED (Number)

    -- Record the end of the test
    UPDATE tt
        SET 
            EndDate = GETDATE()
    FROM #TimingTest tt
    WHERE tt.MethodName = @MethodName
    and tt.TestIteration = @TestIteration

    -- And the stats about the numbers in the sequence
    UPDATE tt
        SET 
            ItemCount = results.ItemCount,
            MaxNumber = results.MaxNumber,
            MinNumber = results.MinNumber
    FROM #TimingTest tt
    CROSS JOIN (
        SELECT COUNT(Number) as ItemCount, MAX(Number) as MaxNumber, MIN(Number) as MinNumber FROM [Numbers].[Test]
    ) results
    WHERE tt.MethodName = @MethodName
    and tt.TestIteration = @TestIteration
*/

/*  -- This method requires GO statements, which would break the script, also - this answer does not appear to be the fastest *AND* seems to perform "magic"
    --
    -- METHOD - "Semi-Looping"
    --

    -- Prep for the test
    DROP TABLE IF EXISTS [Numbers].[Test];
    CREATE TABLE [Numbers].[Test] (Number INT NOT NULL);

    -- Method information
    SET @MethodName = 'SemiLoop';

    -- Record the start of the test
    INSERT INTO #TimingTest(MethodName, TestIteration, StartDate)
    SELECT @MethodName, @TestIteration, GETDATE()

    -- Run the algorithm 
    INSERT [Numbers].[Test] values (1);
--    GO --required

    INSERT [Numbers].[Test] SELECT Number + (SELECT COUNT(*) FROM [Numbers].[Test]) FROM [Numbers].[Test]
--    GO 14 --will create 16384 total rows

    ALTER TABLE [Numbers].[Test] ADD CONSTRAINT PK_Numbers_Test_Number PRIMARY KEY CLUSTERED (Number)

    -- Record the end of the test
    UPDATE tt
        SET 
            EndDate = GETDATE()
    FROM #TimingTest tt
    WHERE tt.MethodName = @MethodName
    and tt.TestIteration = @TestIteration

    -- And the stats about the numbers in the sequence
    UPDATE tt
        SET 
            ItemCount = results.ItemCount,
            MaxNumber = results.MaxNumber,
            MinNumber = results.MinNumber
    FROM #TimingTest tt
    CROSS JOIN (
        SELECT COUNT(Number) as ItemCount, MAX(Number) as MaxNumber, MIN(Number) as MinNumber FROM [Numbers].[Test]
    ) results
    WHERE tt.MethodName = @MethodName
    and tt.TestIteration = @TestIteration
*/
    --
    -- METHOD - Philip Kelley's algo 
    --          (needs tweaking to match the desired length of sequence in order to optimize its performance, relies more on the coder to properly tweak the algorithm)
    --

    -- Prep for the test
    DROP TABLE IF EXISTS [Numbers].[Test];
    CREATE TABLE [Numbers].[Test] (Number INT NOT NULL);

    -- Method information
    SET @MethodName = 'PhilKelley';

    -- Record the start of the test
    INSERT INTO #TimingTest(MethodName, TestIteration, StartDate)
    SELECT @MethodName, @TestIteration, GETDATE()

    -- Run the algorithm
    ; WITH
    RowSet0 as (select 1 as Item union all select 1),              --          2 rows   -- We only have to name the column in the first select, the second/union select inherits the column name
    RowSet1 as (select 1 as Item from RowSet0 as A, RowSet0 as B), --          4 rows
    RowSet2 as (select 1 as Item from RowSet1 as A, RowSet1 as B), --         16 rows
    RowSet3 as (select 1 as Item from RowSet2 as A, RowSet2 as B), --        256 rows
    RowSet4 as (select 1 as Item from RowSet3 as A, RowSet3 as B), --      65536 rows (65k)
    RowSet5 as (select 1 as Item from RowSet4 as A, RowSet4 as B), -- 4294967296 rows (4BB)
    -- Add more RowSetX to get higher and higher numbers of rows    
    -- Each successive RowSetX results in squaring the previously available number of rows
    Tally   as (select row_number() over (order by Item) as Number from RowSet5) -- This is what gives us the sequence of integers, always select from the terminal CTE expression
    -- Note: testing of this specific use case has shown that making Tally as a sub-query instead of a terminal CTE expression is slower (always) - be sure to follow this pattern closely for max performance
    INSERT INTO [Numbers].[Test] (Number)
    SELECT o.Number
    FROM Tally o
    WHERE o.Number <= @NumberOfNumbers

    ALTER TABLE [Numbers].[Test] ADD CONSTRAINT PK_Numbers_Test_Number PRIMARY KEY CLUSTERED (Number)

    -- Record the end of the test
    UPDATE tt
        SET 
            EndDate = GETDATE()
    FROM #TimingTest tt
    WHERE tt.MethodName = @MethodName
    and tt.TestIteration = @TestIteration

    -- And the stats about the numbers in the sequence
    UPDATE tt
        SET 
            ItemCount = results.ItemCount,
            MaxNumber = results.MaxNumber,
            MinNumber = results.MinNumber
    FROM #TimingTest tt
    CROSS JOIN (
        SELECT COUNT(Number) as ItemCount, MAX(Number) as MaxNumber, MIN(Number) as MinNumber FROM [Numbers].[Test]
    ) results
    WHERE tt.MethodName = @MethodName
    and tt.TestIteration = @TestIteration

    --
    -- METHOD - Mladen Prajdic answer
    --

    -- Prep for the test
    DROP TABLE IF EXISTS [Numbers].[Test];
    CREATE TABLE [Numbers].[Test] (Number INT NOT NULL);

    -- Method information
    SET @MethodName = 'MladenPrajdic';

    -- Record the start of the test
    INSERT INTO #TimingTest(MethodName, TestIteration, StartDate)
    SELECT @MethodName, @TestIteration, GETDATE()

    -- Run the algorithm
    INSERT INTO [Numbers].[Test](Number)
    SELECT TOP (@NumberOfNumbers) row_number() over(order by t1.number) as N
    FROM master..spt_values t1 
    CROSS JOIN master..spt_values t2

    ALTER TABLE [Numbers].[Test] ADD CONSTRAINT PK_Numbers_Test_Number PRIMARY KEY CLUSTERED (Number)

    -- Record the end of the test
    UPDATE tt
        SET 
            EndDate = GETDATE()
    FROM #TimingTest tt
    WHERE tt.MethodName = @MethodName
    and tt.TestIteration = @TestIteration

    -- And the stats about the numbers in the sequence
    UPDATE tt
        SET 
            ItemCount = results.ItemCount,
            MaxNumber = results.MaxNumber,
            MinNumber = results.MinNumber
    FROM #TimingTest tt
    CROSS JOIN (
        SELECT COUNT(Number) as ItemCount, MAX(Number) as MaxNumber, MIN(Number) as MinNumber FROM [Numbers].[Test]
    ) results
    WHERE tt.MethodName = @MethodName
    and tt.TestIteration = @TestIteration

    --
    -- METHOD - Single INSERT
    -- 

    -- Prep for the test
    DROP TABLE IF EXISTS [Numbers].[Test];
    -- The Table creation is part of this algorithm ...

    -- Method information
    SET @MethodName = 'SingleInsert';

    -- Record the start of the test
    INSERT INTO #TimingTest(MethodName, TestIteration, StartDate)
    SELECT @MethodName, @TestIteration, GETDATE()

    -- Run the algorithm
    SELECT TOP (@NumberOfNumbers) IDENTITY(int,1,1) AS Number
    INTO [Numbers].[Test]
    FROM sys.objects s1       -- use sys.columns if you don't get enough rows returned to generate all the numbers you need
    CROSS JOIN sys.objects s2 -- use sys.columns if you don't get enough rows returned to generate all the numbers you need

    ALTER TABLE [Numbers].[Test] ADD CONSTRAINT PK_Numbers_Test_Number PRIMARY KEY CLUSTERED (Number)

    -- Record the end of the test
    UPDATE tt
        SET 
            EndDate = GETDATE()
    FROM #TimingTest tt
    WHERE tt.MethodName = @MethodName
    and tt.TestIteration = @TestIteration

    -- And the stats about the numbers in the sequence
    UPDATE tt
        SET 
            ItemCount = results.ItemCount,
            MaxNumber = results.MaxNumber,
            MinNumber = results.MinNumber
    FROM #TimingTest tt
    CROSS JOIN (
        SELECT COUNT(Number) as ItemCount, MAX(Number) as MaxNumber, MIN(Number) as MinNumber FROM [Numbers].[Test]
    ) results
    WHERE tt.MethodName = @MethodName
    and tt.TestIteration = @TestIteration
END

-- Calculate the timespan for each of the runs
UPDATE tt
    SET
        ElapsedTime = DATEDIFF(MICROSECOND, StartDate, EndDate)
FROM #TimingTest tt

--
-- Report the results ...
--
SELECT 
    MethodName, AVG(ElapsedTime) / AVG(ItemCount) as TimePerRecord, CAST(AVG(ItemCount) as bigint) as SequenceLength,
    MAX(ElapsedTime) as MaxTime, MIN(ElapsedTime) as MinTime,
    MAX(MaxNumber) as MaxNumber, MIN(MinNumber) as MinNumber
FROM #TimingTest tt
GROUP by tt.MethodName
ORDER BY TimePerRecord ASC, MaxTime ASC, MinTime ASC
Azpurua answered 12/2, 2019 at 21:8 Comment(0)
H
0

This would do it. Advantages of this approach:

  1. More control over lower and upper values. If at any point you had to cancel the running query, you can modify the lower range to start the process again.
  2. No Primary Key or Identity constraints that could impact query run time.
CREATE TABLE Numbers(N INT);

--

DECLARE @lower_range INT= 1;
DECLARE @upper_range INT= 10000;

--

WHILE(@lower_range <= @upper_range)
    BEGIN
        INSERT INTO Numbers(N)
    VALUES(@lower_range);
        SET @lower_range = @lower_range + 1;
    END;

--

SELECT *
FROM Numbers
ORDER BY N;
Handicap answered 11/1, 2022 at 14:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.