How can I fill a column with random numbers in SQL? I get the same value in every row

Asked 15/2, 2011 at 11:26 Answered 13/12, 2017 at 8:35

104

UPDATE CattleProds
SET SheepTherapy=(ROUND((RAND()* 10000),0))
WHERE SheepTherapy IS NULL

If I then do a SELECT I see that my random number is identical in every row. Any ideas how to generate unique random numbers?

Opia answered 15/2, 2011 at 11:26 Comment(0)

198

Instead of rand(), use newid(), which is recalculated for each row in the result. The usual way is to use the modulo of the checksum. Note that checksum(newid()) can produce -2,147,483,648 and cause integer overflow on abs(), so we need to use modulo on the checksum return value before converting it to absolute value.

UPDATE CattleProds
SET    SheepTherapy = abs(checksum(NewId()) % 10000)
WHERE  SheepTherapy IS NULL

This generates a random number between 0 and 9999.

Namtar answered 15/2, 2011 at 11:28 Comment(5)

This question/answer may also be helpful: https://mcmap.net/q/86207/-how-do-i-generate-a-random-number-for-each-row-in-a-t-sql-select – Havelock 29/4, 2013 at 20:35

This is not working for me at all. Does the column have to be INT? Error #1064 every time. Reaching for the crazy pills... – Quintie 9/7, 2014 at 14:45

This is a thing of beauty! Well done. Love it. A tiny bit slow performance, but still great. – Lolly 14/8, 2018 at 3:17

@ColinR.Turner - Error 1064 is MySQL. This question is tagged SQL Server – Zarla 22/9, 2022 at 9:59

And how do we get a random number with a fixed character length? I would like random 6 digit numbers. – Alessi 6/9, 2023 at 17:56

If you are on SQL Server 2008 you can also use

 CRYPT_GEN_RANDOM(2) % 10000

Which seems somewhat simpler (it is also evaluated once per row as newid is - shown below)

DECLARE @foo TABLE (col1 FLOAT)

INSERT INTO @foo SELECT 1 UNION SELECT 2

UPDATE @foo
SET col1 =  CRYPT_GEN_RANDOM(2) % 10000

SELECT *  FROM @foo

Returns (2 random probably different numbers)

col1
----------------------
9693
8573

Mulling the unexplained downvote the only legitimate reason I can think of is that because the random number generated is between 0-65535 which is not evenly divisible by 10,000 some numbers will be slightly over represented. A way around this would be to wrap it in a scalar UDF that throws away any number over 60,000 and calls itself recursively to get a replacement number.

CREATE FUNCTION dbo.RandomNumber()
RETURNS INT
AS
  BEGIN
      DECLARE @Result INT

      SET @Result = CRYPT_GEN_RANDOM(2)

      RETURN CASE
               WHEN @Result < 60000
                     OR @@NESTLEVEL = 32 THEN @Result % 10000
               ELSE dbo.RandomNumber()
             END
  END

Zarla answered 15/2, 2011 at 11:48 Comment(7)

@downvoter - Any particular reason? Maybe you meant to hit the up arrow this answer works fine! – Zarla 15/2, 2011 at 12:14

What everyone seems to be missing is that this method is MUCH MUCH MUCH better for performance. I've been looking for an alternative to NEWID() and this is spot on, thanks! – Desiderata 26/9, 2013 at 14:15

Any desired range is easily dealt with. For example ABS(CAST(CRYPT_GEN_RANDOM(8) AS BIGINT)%10001) yields a number from 0-10000 which is the range the OP's code would have generated if it had worked the way they hoped. – Catch 28/11, 2018 at 20:47

Which 'same' issue? The formula does generate new values per row (op's problem solved) and the result is within the range but they won't be skewed because there are 64 bits of seed and only 14 bits of result so any potential skew would be undetectable. Even if you generated 10^15 results any skew you might think you are detecting would still be within the margin of error. Meaning you'd need to generate 2^19 results to prove that skew actually existed. – Catch 28/11, 2018 at 20:59

What is 2\? % 10000? – Noreen 16/11, 2021 at 11:16

@Noreen Why are you asking about 2? That isn't in my answer. CRYPT_GEN_RANDOM(2) gives two bytes of random data. The % 10000 implicitly casts that to integer (which will give a value between 0 to 65,535) and returns the remainder when dividing that by 10000. Probably it would be best to use CRYPT_GEN_RANDOM(3) here actually in retrospect to reduce the slight bias that small numbers will receive with this method - or use the revised method that accounts for this – Zarla 16/11, 2021 at 12:27

@MartinSmith You're right, I was asking about CRYPT_GEN_RANDOM(2). Like what does 2 do? Why did you chose 2? etc. It's great to include this information in the answer. – Noreen 16/11, 2021 at 12:40

While I do love using CHECKSUM, I feel that a better way to go is using NEWID(), just because you don't have to go through a complicated math to generate simple numbers .

ROUND( 1000 *RAND(convert(varbinary, newid())), 0)

You can replace the 1000 with whichever number you want to set as the limit, and you can always use a plus sign to create a range, let's say you want a random number between 100 and 200, you can do something like :

100 + ROUND( 100 *RAND(convert(varbinary, newid())), 0)

Putting it together in your query :

UPDATE CattleProds 
SET SheepTherapy= ROUND( 1000 *RAND(convert(varbinary, newid())), 0)
WHERE SheepTherapy IS NULL

Nadbus answered 12/5, 2013 at 5:46 Comment(1)

very good, and this lets you set the range easily – Ockham 14/5, 2021 at 18:34

I tested 2 set based randomization methods against RAND() by generating 100,000,000 rows with each. To level the field the output is a float between 0-1 to mimic RAND(). Most of the code is testing infrastructure so I summarize the algorithms here:

-- Try #1 used
(CAST(CRYPT_GEN_RANDOM(8) AS BIGINT)%500000000000000000+500000000000000000.0)/1000000000000000000 AS Val
-- Try #2 used
RAND(Checksum(NewId()))
-- and to have a baseline to compare output with I used
RAND() -- this required executing 100000000 separate insert statements

Using CRYPT_GEN_RANDOM was clearly the most random since there is only a .000000001% chance of seeing even 1 duplicate when plucking 10^8 numbers FROM a set of 10^18 numbers. IOW we should not have seen any duplicates and this had none! This set took 44 seconds to generate on my laptop.

Cnt     Pct
-----   ----
 1      100.000000  --No duplicates

SQL Server Execution Times: CPU time = 134795 ms, elapsed time = 39274 ms.

IF OBJECT_ID('tempdb..#T0') IS NOT NULL DROP TABLE #T0;
GO
WITH L0   AS (SELECT c FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS D(c))  -- 2^4  
    ,L1   AS (SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B)    -- 2^8  
    ,L2   AS (SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B)    -- 2^16  
    ,L3   AS (SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B)    -- 2^32  
SELECT TOP 100000000 (CAST(CRYPT_GEN_RANDOM(8) AS BIGINT)%500000000000000000+500000000000000000.0)/1000000000000000000 AS Val
  INTO #T0
  FROM L3;

 WITH x AS (
     SELECT Val,COUNT(*) Cnt
      FROM #T0
     GROUP BY Val
)
SELECT x.Cnt,COUNT(*)/(SELECT COUNT(*)/100 FROM #T0) Pct
  FROM X
 GROUP BY x.Cnt;

At almost 15 orders of magnitude less random this method was not quite twice as fast, taking only 23 seconds to generate 100M numbers.

Cnt  Pct
---- ----
1    95.450254    -- only 95% unique is absolutely horrible
2    02.222167    -- If this line were the only problem I'd say DON'T USE THIS!
3    00.034582
4    00.000409    -- 409 numbers appeared 4 times
5    00.000006    -- 6 numbers actually appeared 5 times

SQL Server Execution Times: CPU time = 77156 ms, elapsed time = 24613 ms.

IF OBJECT_ID('tempdb..#T1') IS NOT NULL DROP TABLE #T1;
GO
WITH L0   AS (SELECT c FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS D(c))  -- 2^4  
    ,L1   AS (SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B)    -- 2^8  
    ,L2   AS (SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B)    -- 2^16  
    ,L3   AS (SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B)    -- 2^32  
SELECT TOP 100000000 RAND(Checksum(NewId())) AS Val
  INTO #T1
  FROM L3;

WITH x AS (
    SELECT Val,COUNT(*) Cnt
     FROM #T1
    GROUP BY Val
)
SELECT x.Cnt,COUNT(*)*1.0/(SELECT COUNT(*)/100 FROM #T1) Pct
  FROM X
 GROUP BY x.Cnt;

RAND() alone is useless for set-based generation so generating the baseline for comparing randomness took over 6 hours and had to be restarted several times to finally get the right number of output rows. It also seems that the randomness leaves a lot to be desired although it's better than using checksum(newid()) to reseed each row.

Cnt  Pct
---- ----
1    99.768020
2    00.115840
3    00.000100  -- at least there were comparitively few values returned 3 times

Because of the restarts, execution time could not be captured.

IF OBJECT_ID('tempdb..#T2') IS NOT NULL DROP TABLE #T2;
GO
CREATE TABLE #T2 (Val FLOAT);
GO
SET NOCOUNT ON;
GO
INSERT INTO #T2(Val) VALUES(RAND());
GO 100000000

WITH x AS (
    SELECT Val,COUNT(*) Cnt
     FROM #T2
    GROUP BY Val
)
SELECT x.Cnt,COUNT(*)*1.0/(SELECT COUNT(*)/100 FROM #T2) Pct
  FROM X
 GROUP BY x.Cnt;

Catch answered 6/7, 2017 at 19:30 Comment(2)

P.S. Thinking that the restarts could have accounted for some of the duplicates I quickly tested just 3M rows which took almost 6-1/2 minutes. I got 2101 dups and 2 values appeared 3 times (.07% and .000067% respectively) indicating restarts probably played a part but randomness is still far from stellar. – Catch 6/7, 2017 at 19:50

Having noticed one other answer just seeded with newid converted to varbinary so I tried that too. Not only is it no faster than using checksum but one value appear 8 times in that test. To be fair, it was still 95.447319% unique which is only barely worse than RAND(Checksum(NewId()))'s 95.450254% in my test. A second execution yielded a worst case of 3 numbers appearing 5 times and 95.452929% distinct so YMMV even when testing 100M rows. – Catch 6/7, 2017 at 20:27

-4

require_once('db/connect.php');

//rand(1000000 , 9999999);

$products_query = "SELECT id FROM products";
$products_result = mysqli_query($conn, $products_query);
$products_row = mysqli_fetch_array($products_result);
$ids_array = [];

do
{
    array_push($ids_array, $products_row['id']);
}
while($products_row = mysqli_fetch_array($products_result));

/*
echo '<pre>';
print_r($ids_array);
echo '</pre>';
*/
$row_counter = count($ids_array);

for ($i=0; $i < $row_counter; $i++)
{ 
    $current_row = $ids_array[$i];
    $rand = rand(1000000 , 9999999);
    mysqli_query($conn , "UPDATE products SET code='$rand' WHERE id='$current_row'");
}

Inclinable answered 13/12, 2017 at 8:35 Comment(2)

maybe it not correct and easylest way but it works ))) – Inclinable 13/12, 2017 at 8:36

Please read the question carefully before you start answering. By the way, sending an UPDATE query for each and every row separately is a VERY, VERY BAD IDEA when one has to UPDATE even a modest number of rows. – Deliciadelicious 28/5, 2019 at 16:6

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags