How can I fill a column with random numbers in SQL? I get the same value in every row
Asked Answered
O

5

104
UPDATE CattleProds
SET SheepTherapy=(ROUND((RAND()* 10000),0))
WHERE SheepTherapy IS NULL

If I then do a SELECT I see that my random number is identical in every row. Any ideas how to generate unique random numbers?

Opia answered 15/2, 2011 at 11:26 Comment(0)
N
198

Instead of rand(), use newid(), which is recalculated for each row in the result. The usual way is to use the modulo of the checksum. Note that checksum(newid()) can produce -2,147,483,648 and cause integer overflow on abs(), so we need to use modulo on the checksum return value before converting it to absolute value.

UPDATE CattleProds
SET    SheepTherapy = abs(checksum(NewId()) % 10000)
WHERE  SheepTherapy IS NULL

This generates a random number between 0 and 9999.

Namtar answered 15/2, 2011 at 11:28 Comment(5)
This question/answer may also be helpful: https://mcmap.net/q/86207/-how-do-i-generate-a-random-number-for-each-row-in-a-t-sql-selectHavelock
This is not working for me at all. Does the column have to be INT? Error #1064 every time. Reaching for the crazy pills...Quintie
This is a thing of beauty! Well done. Love it. A tiny bit slow performance, but still great.Lolly
@ColinR.Turner - Error 1064 is MySQL. This question is tagged SQL ServerZarla
And how do we get a random number with a fixed character length? I would like random 6 digit numbers.Alessi
Z
27

If you are on SQL Server 2008 you can also use

 CRYPT_GEN_RANDOM(2) % 10000

Which seems somewhat simpler (it is also evaluated once per row as newid is - shown below)

DECLARE @foo TABLE (col1 FLOAT)

INSERT INTO @foo SELECT 1 UNION SELECT 2

UPDATE @foo
SET col1 =  CRYPT_GEN_RANDOM(2) % 10000

SELECT *  FROM @foo

Returns (2 random probably different numbers)

col1
----------------------
9693
8573

Mulling the unexplained downvote the only legitimate reason I can think of is that because the random number generated is between 0-65535 which is not evenly divisible by 10,000 some numbers will be slightly over represented. A way around this would be to wrap it in a scalar UDF that throws away any number over 60,000 and calls itself recursively to get a replacement number.

CREATE FUNCTION dbo.RandomNumber()
RETURNS INT
AS
  BEGIN
      DECLARE @Result INT

      SET @Result = CRYPT_GEN_RANDOM(2)

      RETURN CASE
               WHEN @Result < 60000
                     OR @@NESTLEVEL = 32 THEN @Result % 10000
               ELSE dbo.RandomNumber()
             END
  END  
Zarla answered 15/2, 2011 at 11:48 Comment(7)
@downvoter - Any particular reason? Maybe you meant to hit the up arrow this answer works fine!Zarla
What everyone seems to be missing is that this method is MUCH MUCH MUCH better for performance. I've been looking for an alternative to NEWID() and this is spot on, thanks!Desiderata
Any desired range is easily dealt with. For example ABS(CAST(CRYPT_GEN_RANDOM(8) AS BIGINT)%10001) yields a number from 0-10000 which is the range the OP's code would have generated if it had worked the way they hoped.Catch
Which 'same' issue? The formula does generate new values per row (op's problem solved) and the result is within the range but they won't be skewed because there are 64 bits of seed and only 14 bits of result so any potential skew would be undetectable. Even if you generated 10^15 results any skew you might think you are detecting would still be within the margin of error. Meaning you'd need to generate 2^19 results to prove that skew actually existed.Catch
What is 2\? % 10000?Noreen
@Noreen Why are you asking about 2? That isn't in my answer. CRYPT_GEN_RANDOM(2) gives two bytes of random data. The % 10000 implicitly casts that to integer (which will give a value between 0 to 65,535) and returns the remainder when dividing that by 10000. Probably it would be best to use CRYPT_GEN_RANDOM(3) here actually in retrospect to reduce the slight bias that small numbers will receive with this method - or use the revised method that accounts for thisZarla
@MartinSmith You're right, I was asking about CRYPT_GEN_RANDOM(2). Like what does 2 do? Why did you chose 2? etc. It's great to include this information in the answer.Noreen
N
14

While I do love using CHECKSUM, I feel that a better way to go is using NEWID(), just because you don't have to go through a complicated math to generate simple numbers .

ROUND( 1000 *RAND(convert(varbinary, newid())), 0)

You can replace the 1000 with whichever number you want to set as the limit, and you can always use a plus sign to create a range, let's say you want a random number between 100 and 200, you can do something like :

100 + ROUND( 100 *RAND(convert(varbinary, newid())), 0)

Putting it together in your query :

UPDATE CattleProds 
SET SheepTherapy= ROUND( 1000 *RAND(convert(varbinary, newid())), 0)
WHERE SheepTherapy IS NULL
Nadbus answered 12/5, 2013 at 5:46 Comment(1)
very good, and this lets you set the range easilyOckham
C
3

I tested 2 set based randomization methods against RAND() by generating 100,000,000 rows with each. To level the field the output is a float between 0-1 to mimic RAND(). Most of the code is testing infrastructure so I summarize the algorithms here:

-- Try #1 used
(CAST(CRYPT_GEN_RANDOM(8) AS BIGINT)%500000000000000000+500000000000000000.0)/1000000000000000000 AS Val
-- Try #2 used
RAND(Checksum(NewId()))
-- and to have a baseline to compare output with I used
RAND() -- this required executing 100000000 separate insert statements

Using CRYPT_GEN_RANDOM was clearly the most random since there is only a .000000001% chance of seeing even 1 duplicate when plucking 10^8 numbers FROM a set of 10^18 numbers. IOW we should not have seen any duplicates and this had none! This set took 44 seconds to generate on my laptop.

Cnt     Pct
-----   ----
 1      100.000000  --No duplicates

SQL Server Execution Times: CPU time = 134795 ms, elapsed time = 39274 ms.

IF OBJECT_ID('tempdb..#T0') IS NOT NULL DROP TABLE #T0;
GO
WITH L0   AS (SELECT c FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS D(c))  -- 2^4  
    ,L1   AS (SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B)    -- 2^8  
    ,L2   AS (SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B)    -- 2^16  
    ,L3   AS (SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B)    -- 2^32  
SELECT TOP 100000000 (CAST(CRYPT_GEN_RANDOM(8) AS BIGINT)%500000000000000000+500000000000000000.0)/1000000000000000000 AS Val
  INTO #T0
  FROM L3;

 WITH x AS (
     SELECT Val,COUNT(*) Cnt
      FROM #T0
     GROUP BY Val
)
SELECT x.Cnt,COUNT(*)/(SELECT COUNT(*)/100 FROM #T0) Pct
  FROM X
 GROUP BY x.Cnt;

At almost 15 orders of magnitude less random this method was not quite twice as fast, taking only 23 seconds to generate 100M numbers.

Cnt  Pct
---- ----
1    95.450254    -- only 95% unique is absolutely horrible
2    02.222167    -- If this line were the only problem I'd say DON'T USE THIS!
3    00.034582
4    00.000409    -- 409 numbers appeared 4 times
5    00.000006    -- 6 numbers actually appeared 5 times 

SQL Server Execution Times: CPU time = 77156 ms, elapsed time = 24613 ms.

IF OBJECT_ID('tempdb..#T1') IS NOT NULL DROP TABLE #T1;
GO
WITH L0   AS (SELECT c FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS D(c))  -- 2^4  
    ,L1   AS (SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B)    -- 2^8  
    ,L2   AS (SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B)    -- 2^16  
    ,L3   AS (SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B)    -- 2^32  
SELECT TOP 100000000 RAND(Checksum(NewId())) AS Val
  INTO #T1
  FROM L3;

WITH x AS (
    SELECT Val,COUNT(*) Cnt
     FROM #T1
    GROUP BY Val
)
SELECT x.Cnt,COUNT(*)*1.0/(SELECT COUNT(*)/100 FROM #T1) Pct
  FROM X
 GROUP BY x.Cnt;

RAND() alone is useless for set-based generation so generating the baseline for comparing randomness took over 6 hours and had to be restarted several times to finally get the right number of output rows. It also seems that the randomness leaves a lot to be desired although it's better than using checksum(newid()) to reseed each row.

Cnt  Pct
---- ----
1    99.768020
2    00.115840
3    00.000100  -- at least there were comparitively few values returned 3 times

Because of the restarts, execution time could not be captured.

IF OBJECT_ID('tempdb..#T2') IS NOT NULL DROP TABLE #T2;
GO
CREATE TABLE #T2 (Val FLOAT);
GO
SET NOCOUNT ON;
GO
INSERT INTO #T2(Val) VALUES(RAND());
GO 100000000

WITH x AS (
    SELECT Val,COUNT(*) Cnt
     FROM #T2
    GROUP BY Val
)
SELECT x.Cnt,COUNT(*)*1.0/(SELECT COUNT(*)/100 FROM #T2) Pct
  FROM X
 GROUP BY x.Cnt;
Catch answered 6/7, 2017 at 19:30 Comment(2)
P.S. Thinking that the restarts could have accounted for some of the duplicates I quickly tested just 3M rows which took almost 6-1/2 minutes. I got 2101 dups and 2 values appeared 3 times (.07% and .000067% respectively) indicating restarts probably played a part but randomness is still far from stellar.Catch
Having noticed one other answer just seeded with newid converted to varbinary so I tried that too. Not only is it no faster than using checksum but one value appear 8 times in that test. To be fair, it was still 95.447319% unique which is only barely worse than RAND(Checksum(NewId()))'s 95.450254% in my test. A second execution yielded a worst case of 3 numbers appearing 5 times and 95.452929% distinct so YMMV even when testing 100M rows.Catch
I
-4
require_once('db/connect.php');

//rand(1000000 , 9999999);

$products_query = "SELECT id FROM products";
$products_result = mysqli_query($conn, $products_query);
$products_row = mysqli_fetch_array($products_result);
$ids_array = [];

do
{
    array_push($ids_array, $products_row['id']);
}
while($products_row = mysqli_fetch_array($products_result));

/*
echo '<pre>';
print_r($ids_array);
echo '</pre>';
*/
$row_counter = count($ids_array);

for ($i=0; $i < $row_counter; $i++)
{ 
    $current_row = $ids_array[$i];
    $rand = rand(1000000 , 9999999);
    mysqli_query($conn , "UPDATE products SET code='$rand' WHERE id='$current_row'");
}
Inclinable answered 13/12, 2017 at 8:35 Comment(2)
maybe it not correct and easylest way but it works )))Inclinable
Please read the question carefully before you start answering. By the way, sending an UPDATE query for each and every row separately is a VERY, VERY BAD IDEA when one has to UPDATE even a modest number of rows.Deliciadelicious

© 2022 - 2024 — McMap. All rights reserved.