I have an unimportant query which uses newId() and joins many tables. It returns about 10k rows in about 3 seconds. So, newId() might be ok in such cases where performance is not too bad & does not have a huge impact. But, newId() is bad for large tables.
Here is the explanation from Brent Ozar's blog - https://www.brentozar.com/archive/2018/03/get-random-row-large-table/.
From the above link, I have summarized the methods which you can use to generate a random id. You can read the blog for more details.
4 ways to get a random row from a large table:
- Method 1, Bad: ORDER BY NEWID() > Bad performance!
- Method 2, Better but Strange: TABLESAMPLE > Many gotchas & is not really
random!
- Method 3, Best but Requires Code: Random Primary Key >
Fastest, but won't work for negative numbers.
- Method 4, OFFSET-FETCH (2012+) > Only performs properly with a clustered
index.
More on method 3:
Get the top ID field in the table, generate a random number, and look for that ID. For top N rows, call the code below N times or generate N random numbers and use in an IN clause.
/* Get a random number smaller than the table's top ID */
DECLARE @rand BIGINT;
DECLARE @maxid INT = (SELECT MAX(Id) FROM dbo.Users);
SELECT @rand = ABS((CHECKSUM(NEWID()))) % @maxid;
/* Get the first row around that ID */
SELECT TOP 1 *
FROM dbo.Users AS u
WHERE u.Id >= @rand;
ORDER BY
clause do not need to appear in yourSELECT
clause in SQL Server. – Califate