MySQL select 10 random rows from 600K rows fast
Asked Answered
A

29

554

How can I best write a query that selects 10 rows randomly from a total of 600k?

Alisun answered 1/12, 2010 at 21:35 Comment(2)
Here's 8 techniques; perhaps one will work well in your case.Steiger
(That's actually 5 techniques -- some were not improvements.)Steiger
M
450

A great post handling several cases, from simple, to gaps, to non-uniform with gaps.

http://jan.kneschke.de/projects/mysql/order-by-rand/

For most general case, here is how you do it:

SELECT name
  FROM random AS r1 JOIN
       (SELECT CEIL(RAND() *
                     (SELECT MAX(id)
                        FROM random)) AS id)
        AS r2
 WHERE r1.id >= r2.id
 ORDER BY r1.id ASC
 LIMIT 1

This supposes that the distribution of ids is equal, and that there can be gaps in the id list. See the article for more advanced examples

Mac answered 1/12, 2010 at 21:41 Comment(16)
Yes, if you have potentially big gaps in ID's then the chance of your lowest ID's being picked randomly is much lower than your high IDs. In fact the chance that the first ID after the biggest gap getting picked is actually the highest. Therefore this isn't random by definition.Hoof
How do you get 10 different random rows? Do you have to set limit to 10 and then iterate 10 times with mysqli_fetch_assoc($result) ? Or are those 10 results not necessarily distinguishable?Peres
Random requires an equal chance for any result, in my mind. ;)Hoof
The full article addresses issues like unequal distributions and repeated results.Twirp
specifically, if you have a gap at the start of your IDs the first one will get picked (min/max-min) of the time. For that case a simple tweak is MAX()-MIN() * RAND + MIN(), which is not too slow.Condemnatory
Is it possible to use this method for an update? I've tried switching to update but that throws Incorrect usage of UPDATE and ORDER BY Bagasse
Why did not you use order by rand() ?Canescent
Someone please look this sample: i have 600k records in 1 table name 'sample'. first row has id = '1', second row has id = '10000' and next records += 1 in id. so with this solution, i almost got result = second row when random :(. someone have solution ?Albano
When running this with a limit of 5 it does not consistently return 5 results.Jussive
the resulting rows are subsequent.Loewi
Most of the posts in that link do table scans. Test your technique this way: FLUSH STATUS; SELECT ... ; SHOW SESSION STATUS LIKE 'Handler%';. If you see numbers like the row count of the table, not good. If you see only numbers like the row count of the resultset, good.Steiger
I'm using PHP and I wound up just using shuffle php.net/manual/en/function.shuffle.php to randomize the array instead of doing it via MySQL.Ameliaamelie
This can also lead to selecting two same keyDeegan
That's incredibly faster than simply reordering by a random number:-)Sicyon
This sometimes gives empty result depending on the WHERE condition. For example, WHERE r1.id >= r2.id AND r1.id NOT IN(6,43,21,35,77) AND status_code IN(1,2,3,4,5) (Consider that there is a column called status_code). This may or may not give an empty result.Wortham
@Hoof Technically, random doesn't mean uniformly at random, although as used casually it does, to be fair.Trabue
S
437
SELECT column FROM table
ORDER BY RAND()
LIMIT 10

Not the efficient solution but works

Saez answered 13/10, 2012 at 6:43 Comment(7)
ORDER BY RAND() is relatively slowMcgowen
Mateusz - proof pls, SELECT words, transcription, translation, sound FROM vocabulary WHERE menu_id=$menuId ORDER BY RAND() LIMIT 10 takes 0.0010, without LIMIT 10 it took 0.0012 (in that table 3500 words).Seagraves
@zeusakm 3500 words is not that much; the problem is that it explodes past a certain point because MySQL has to actually sort ALL records after reading each one; once that operation hits the hard disc you can feel the difference.Sportswear
I don't want to repeat myself but again, that's full table scan. On large table it's very time and memory consuming and might cause creation of & operation on temporary table on disk which is very slow.Stepaniestepbrother
When I was interviewing with Facebook back in 2010, they asked me how to select a random record from a huge file of unknown size, in one reading. Once you come up with an idea, it is easy to generalize it for selecting multiple records. So yes, sorting the entire file is ridiculous. At the same time, it is very handy. I just used this approach to pick 10 random rows from a table with 1,000,000+rows. Sure, I had to wait a bit; but I just wanted to get an idea, what typical rows in this table looks like...Vespertine
Is this uniform random sampling or it emits output only till the LIMIT is completed. Are you sure all rows are first assigned the random number and then ordered?Angst
on a database with 1M entries, this takes about 2 secondsLoewi
T
73

Simple query that has excellent performance and works with gaps:

SELECT * FROM tbl AS t1 JOIN (SELECT id FROM tbl ORDER BY RAND() LIMIT 10) as t2 ON t1.id=t2.id

This query on a 200K table takes 0.08s and the normal version (SELECT * FROM tbl ORDER BY RAND() LIMIT 10) takes 0.35s on my machine.

This is fast because the sort phase only uses the indexed ID column. You can see this behaviour in the explain:

SELECT * FROM tbl ORDER BY RAND() LIMIT 10: Simple Explain

SELECT * FROM tbl AS t1 JOIN (SELECT id FROM tbl ORDER BY RAND() LIMIT 10) as t2 ON t1.id=t2.id enter image description here

Weighted Version: https://mcmap.net/q/66434/-mysql-select-random-entry-but-weight-towards-certain-entries

Telescope answered 11/1, 2017 at 0:53 Comment(2)
The derived table still has to scan and sort the entire table.Steiger
for me, its much faster in million dataWeinman
K
17

I am getting fast queries (around 0.5 seconds) with a slow cpu, selecting 10 random rows in a 400K registers MySQL database non-cached 2Gb size. See here my code: Fast selection of random rows in MySQL

$time= microtime_float();

$sql='SELECT COUNT(*) FROM pages';
$rquery= BD_Ejecutar($sql);
list($num_records)=mysql_fetch_row($rquery);
mysql_free_result($rquery);

$sql="SELECT id FROM pages WHERE RAND()*$num_records<20
   ORDER BY RAND() LIMIT 0,10";
$rquery= BD_Ejecutar($sql);
while(list($id)=mysql_fetch_row($rquery)){
    if($id_in) $id_in.=",$id";
    else $id_in="$id";
}
mysql_free_result($rquery);

$sql="SELECT id,url FROM pages WHERE id IN($id_in)";
$rquery= BD_Ejecutar($sql);
while(list($id,$url)=mysql_fetch_row($rquery)){
    logger("$id, $url",1);
}
mysql_free_result($rquery);

$time= microtime_float()-$time;

logger("num_records=$num_records",1);
logger("$id_in",1);
logger("Time elapsed: <b>$time segundos</b>",1);
Kohler answered 2/8, 2011 at 0:30 Comment(8)
Given my over 14 million records table, this is as slow as ORDER BY RAND()Erskine
@Kohler In your case - 400k of rows you can use simple "ORDER BY rand()". Your trick with 3 queries is useless. You can rewrite it like "SELECT id, url FROM pages WHERE id IN (SELECT id FROM pages ORDER BY rand() LIMIT 10)"Westerly
Your technique still does a table scan. Use FLUSH STATUS; SELECT ...; SHOW SESSION STATUS LIKE 'Handler%'; to see it.Steiger
Also try to run that query in 200 req/s webpage. Concurrency will kill you.Ryals
@RomanPodlinov benefit of this over plain ORDER BY RAND() is that it sorts only the ids (not full rows), so temp table is smaller, but still has to sort all of them.Ryals
@Ryals My query sorts only the Ids (not full rows) and then select full rows by selected IdsWesterly
The killer is actually the select count(*) at the start. If you collect it from the information scheme of get the highest id (assuming it is the primary key) it will be less accurate but much faster.Pleinair
may you wants to change microtime_float() with microtime(true)Wangle
D
14

From book :

Choose a Random Row Using an Offset

Still another technique that avoids problems found in the preceding alternatives is to count the rows in the data set and return a random number between 0 and the count. Then use this number as an offset when querying the data set

$rand = "SELECT ROUND(RAND() * (SELECT COUNT(*) FROM Bugs))";
$offset = $pdo->query($rand)->fetch(PDO::FETCH_ASSOC);
$sql = "SELECT * FROM Bugs LIMIT 1 OFFSET :offset";
$stmt = $pdo->prepare($sql);
$stmt->execute( $offset );
$rand_bug = $stmt->fetch();

Use this solution when you can’t assume contiguous key values and you need to make sure each row has an even chance of being selected.

Daiseydaisi answered 26/6, 2015 at 6:7 Comment(2)
for very large tables, SELECT count(*) becomes slow.Darcidarcia
OFFSET must step over that many rows. So this 'solution' costs an average of 1.5*N where N is the number of rows in the table.Steiger
H
13

Its very simple and single line query.

SELECT * FROM Table_Name ORDER BY RAND() LIMIT 0,10;
Handedness answered 13/1, 2016 at 14:20 Comment(4)
FYI, order by rand() is very slow if the table is largeHeadsail
Sometimes the SLOW is accepted if i want to keep it SIMPLEDepressant
Indexing should be applied on the table if its large.Handedness
Indexing won't help here. Indexes are helpful for very specific things, and this query is not one of them.Centroclinal
V
11

Well if you have no gaps in your keys and they are all numeric you can calculate random numbers and select those lines. but this will probably not be the case.

So one solution would be the following:

SELECT * FROM table WHERE key >= FLOOR(RAND()*MAX(id)) LIMIT 1

which will basically ensure that you get a random number in the range of your keys and then you select the next best which is greater. you have to do this 10 times.

however this is NOT really random because your keys will most likely not be distributed evenly.

It's really a big problem and not easy to solve fulfilling all the requirements, MySQL's rand() is the best you can get if you really want 10 random rows.

There is however another solution which is fast but also has a trade off when it comes to randomness, but may suit you better. Read about it here: How can i optimize MySQL's ORDER BY RAND() function?

Question is how random do you need it to be.

Can you explain a bit more so I can give you a good solution.

For example a company I worked with had a solution where they needed absolute randomness extremely fast. They ended up with pre-populating the database with random values that were selected descending and set to different random values afterwards again.

If you hardly ever update you could also fill an incrementing id so you have no gaps and just can calculate random keys before selecting... It depends on the use case!

Vaccine answered 1/12, 2010 at 21:46 Comment(6)
Hi Joe. In this particular case keys should not lack gaps, but over time this may change. And while your answer works, it will generate the random 10 rows (provided I write limit 10) that are consecutive and I wanted more randomness so to speak. :) Thank you.Alisun
If you need 10 use some sort of union to generate 10 unique rows.Dosage
tahts what i said. you need to execute that 10 times. combining it wition union is one way to put it in one query. see my addendum 2 mins ago.Vaccine
@TheSurrican, This solution looks cool but is highly flawed. Try insert just one very large Id and all your random queries will return you that one Id.Griseofulvin
FLOOR(RAND()*MAX(id)) is biased toward returning larger ids.Steiger
The last suggestion was a good one. We use incrementing IDs so calculating random keys before selecting actually would work for us. It is easy to forget that SQL queries can be quite large because we often work so hard to make them small.Biliary
D
10

How to select random rows from a table:

From here: Select random rows in MySQL

A quick improvement over "table scan" is to use the index to pick up random ids.

SELECT *
FROM random, (
        SELECT id AS sid
        FROM random
        ORDER BY RAND( )
        LIMIT 10
    ) tmp
WHERE random.id = tmp.sid;
Diagnostic answered 20/7, 2013 at 1:34 Comment(2)
That helps some for MyISAM, but not for InnoDB (assuming id is the clustered PRIMARY KEY).Steiger
The inner query does a full table scan and sorts the results. Actually most, maybe all, of the techniques in that link involve a full scan.Steiger
R
6

All the best answers have been already posted (mainly those referencing the link http://jan.kneschke.de/projects/mysql/order-by-rand/).

I want to pinpoint another speed-up possibility - caching. Think of why you need to get random rows. Probably you want display some random post or random ad on a website. If you are getting 100 req/s, is it really needed that each visitor gets random rows? Usually it is completely fine to cache these X random rows for 1 second (or even 10 seconds). It doesn't matter if 100 unique visitors in the same 1 second get the same random posts, because the next second another 100 visitors will get different set of posts.

When using this caching you can use also some of the slower solution for getting the random data as it will be fetched from MySQL only once per second regardless of your req/s.

Ryals answered 7/7, 2015 at 13:52 Comment(0)
D
6

I improved the answer @Riedsio had. This is the most efficient query I can find on a large, uniformly distributed table with gaps (tested on getting 1000 random rows from a table that has > 2.6B rows).

(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max := (SELECT MAX(id) FROM table)) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1)

Let me unpack what's going on.

  1. @max := (SELECT MAX(id) FROM table)
    • I'm calculating and saving the max. For very large tables, there is a slight overhead for calculating MAX(id) each time you need a row
  2. SELECT FLOOR(rand() * @max) + 1 as rand)
    • Gets a random id
  3. SELECT id FROM table INNER JOIN (...) on id > rand LIMIT 1
    • This fills in the gaps. Basically if you randomly select a number in the gaps, it will just pick the next id. Assuming the gaps are uniformly distributed, this shouldn't be a problem.

Doing the union helps you fit everything into 1 query so you can avoid doing multiple queries. It also lets you save the overhead of calculating MAX(id). Depending on your application, this might matter a lot or very little.

Note that this gets only the ids and gets them in random order. If you want to do anything more advanced I recommend you do this:

SELECT t.id, t.name -- etc, etc
FROM table t
INNER JOIN (
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max := (SELECT MAX(id) FROM table)) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1)
) x ON x.id = t.id
ORDER BY t.id
Darcidarcia answered 28/3, 2017 at 23:45 Comment(5)
I need 30 random records, so should I change LIMIT 1 to LIMIT 30 everywhere in queryRakehell
@Rakehell you should not, that changing LIMIT 1 to LIMIT 30 would get you 30 records in a row from a random point in the table. You should instead have 30 copies of the (SELECT id FROM .... part in the middle.Darcidarcia
I have tried but does not seem more efficient then Riedsio answer . I have tried with 500 per second hits to the page using PHP 7.0.22 and MariaDB on centos 7, with Riedsio answer I got 500+ extra successful response then your answer.Rakehell
@Rakehell riedsio's answer gives 1 row, this one gives you n rows, as well as cuts down on the I/O overhead for querying. You might be able to get rows faster, but with more load on your system.Darcidarcia
+1 part will lead to not returning results for few cases SELECT FLOOR(rand() * @max) + 1 as rand), If table contains 1 record with id 1. Floor will be 0 and rand 0+1 = 1. id > rand will not return anything (1 > 1) If random returns 0.999*****. Example max id in table is 100. Floor will be 99 and rand 99+1 = 100. id > rand will not return anything (100 > 100)Woad
P
5

I've looked through all of the answers, and I don't think anyone mentions this possibility at all, and I'm not sure why.

If you want utmost simplicity and speed, at a minor cost, then to me it seems to make sense to store a random number against each row in the DB. Just create an extra column, random_number, and set it's default to RAND(). Create an index on this column.

Then when you want to retrieve a row generate a random number in your code (PHP, Perl, whatever) and compare that to the column.

SELECT FROM tbl WHERE random_number >= :random LIMIT 1

I guess although it's very neat for a single row, for ten rows like the OP asked you'd have to call it ten separate times (or come up with a clever tweak that escapes me immediately)

Papeete answered 25/6, 2018 at 12:0 Comment(7)
This is actually a very nice and efficient approach. The only draw back is the fact that you traded space for speed, which seems like a fair deal in my opinion.Remonstrant
Thanks. I had a scenario where the main table I wanted a random row from had 5 million rows, and quite a lot of joins, and after trying most approaches in this question this was the kludge that I settled on. One extra column was a very worthwhile tradeoff, for me.Papeete
How if you want to get 10 rows with "LIMIT 10"? It seems the possibilities not even.Liminal
As I said at the end of my answer @edwardaa, it only really works if you want a single row. Or if you don't mind the overhead of calling it multiple times.Papeete
It does not work, if the generated random number for select is bigger than any random number in the database.Hyehyena
Yep, you'll need to keep track of min and max of that column and make sure you're generating a number in that range.Papeete
I think the problem with that approach is what happens when your rand returned the same value twice. One of the rows will never get chosen, or you will have to manually make sure your values are not repeated. I would change RAND() to UUID_SHORT() and randomized between lowest and highest value in DB with caching the retrieved max and min values between request.Boxboard
L
4

I know it is not what you want, but the answer I will give you is what I use in production in a small website.

Depending on the quantity of times you access the random value, it is not worthy to use MySQL, just because you won't be able to cache the answer. We have a button there to access a random page, and a user could click in there several times per minute if he wants. This will cause a mass amount of MySQL usage and, at least for me, MySQL is the biggest problem to optimize.

I would go another approach, where you can store in cache the answer. Do one call to your MySQL:

SELECT min(id) as min, max(id) as max FROM your_table

With your min and max Id, you can, in your server, calculate a random number. In python:

random.randint(min, max)

Then, with your random number, you can get a random Id in your Table:

SELECT * 
FROM your_table 
WHERE id >= %s 
ORDER BY id ASC 
LIMIT 1

In this method you do two calls to your Database, but you can cache them and don't access the Database for a long period of time, enhancing performance. Note that this is not random if you have holes in your table. Having more than 1 row is easy since you can create the Id using python and do one request for each row, but since they are cached, it's ok.

If you have too many holes in your table, you can try the same approach, but now going for the total number of records:

SELECT COUNT(*) as total FROM your_table

Then in python you go:

random.randint(0, total)

And to fetch a random result you use the LIMIT like bellow:

SELECT * 
FROM your_table 
ORDER BY id ASC 
LIMIT %s, 1

Notice it will get 1 value after X random rows. Even if you have holes in your table, it will be completely random, but it will cost more for your database.

Landrum answered 8/7, 2021 at 18:30 Comment(0)
C
3

I used this http://jan.kneschke.de/projects/mysql/order-by-rand/ posted by Riedsio (i used the case of a stored procedure that returns one or more random values):

   DROP TEMPORARY TABLE IF EXISTS rands;
   CREATE TEMPORARY TABLE rands ( rand_id INT );

    loop_me: LOOP
        IF cnt < 1 THEN
          LEAVE loop_me;
        END IF;

        INSERT INTO rands
           SELECT r1.id
             FROM random AS r1 JOIN
                  (SELECT (RAND() *
                                (SELECT MAX(id)
                                   FROM random)) AS id)
                   AS r2
            WHERE r1.id >= r2.id
            ORDER BY r1.id ASC
            LIMIT 1;

        SET cnt = cnt - 1;
      END LOOP loop_me;

In the article he solves the problem of gaps in ids causing not so random results by maintaining a table (using triggers, etc...see the article); I'm solving the problem by adding another column to the table, populated with contiguous numbers, starting from 1 (edit: this column is added to the temporary table created by the subquery at runtime, doesn't affect your permanent table):

   DROP TEMPORARY TABLE IF EXISTS rands;
   CREATE TEMPORARY TABLE rands ( rand_id INT );

    loop_me: LOOP
        IF cnt < 1 THEN
          LEAVE loop_me;
        END IF;

        SET @no_gaps_id := 0;

        INSERT INTO rands
           SELECT r1.id
             FROM (SELECT id, @no_gaps_id := @no_gaps_id + 1 AS no_gaps_id FROM random) AS r1 JOIN
                  (SELECT (RAND() *
                                (SELECT COUNT(*)
                                   FROM random)) AS id)
                   AS r2
            WHERE r1.no_gaps_id >= r2.id
            ORDER BY r1.no_gaps_id ASC
            LIMIT 1;

        SET cnt = cnt - 1;
      END LOOP loop_me;

In the article i can see he went to great lengths to optimize the code; i have no ideea if/how much my changes impact the performance but works very well for me.

Checani answered 28/8, 2012 at 17:13 Comment(1)
"i have no ideea if/how much my changes impact the performance" - quite a lot. For the @no_gaps_id no index can be used, so if you look at EXPLAIN for your query, you have Using filesort and Using where (without index) for the subqueries, in contrast to the original query.Comedo
D
3

I needed a query to return a large number of random rows from a rather large table. This is what I came up with. First get the maximum record id:

SELECT MAX(id) FROM table_name;

Then substitute that value into:

SELECT * FROM table_name WHERE id > FLOOR(RAND() * max) LIMIT n;

Where max is the maximum record id in the table and n is the number of rows you want in your result set. The assumption is that there are no gaps in the record id's although I doubt it would affect the result if there were (haven't tried it though). I also created this stored procedure to be more generic; pass in the table name and number of rows to be returned. I'm running MySQL 5.5.38 on Windows 2008, 32GB, dual 3GHz E5450, and on a table with 17,361,264 rows it's fairly consistent at ~.03 sec / ~11 sec to return 1,000,000 rows. (times are from MySQL Workbench 6.1; you could also use CEIL instead of FLOOR in the 2nd select statement depending on your preference)

DELIMITER $$

USE [schema name] $$

DROP PROCEDURE IF EXISTS `random_rows` $$

CREATE PROCEDURE `random_rows`(IN tab_name VARCHAR(64), IN num_rows INT)
BEGIN

SET @t = CONCAT('SET @max=(SELECT MAX(id) FROM ',tab_name,')');
PREPARE stmt FROM @t;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

SET @t = CONCAT(
    'SELECT * FROM ',
    tab_name,
    ' WHERE id>FLOOR(RAND()*@max) LIMIT ',
    num_rows);

PREPARE stmt FROM @t;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END
$$

then

CALL [schema name].random_rows([table name], n);
Drab answered 24/9, 2014 at 13:47 Comment(0)
I
3

Here is a game changer that may be helpfully for many;

I have a table with 200k rows, with sequential id's, I needed to pick N random rows, so I opt to generate random values based in the biggest ID in the table, I created this script to find out which is the fastest operation:

logTime();
query("SELECT COUNT(id) FROM tbl");
logTime();
query("SELECT MAX(id) FROM tbl");
logTime();
query("SELECT id FROM tbl ORDER BY id DESC LIMIT 1");
logTime();

The results are:

  • Count: 36.8418693542479 ms
  • Max: 0.241041183472 ms
  • Order: 0.216960906982 ms

Based in this results, order desc is the fastest operation to get the max id,
Here is my answer to the question:

SELECT GROUP_CONCAT(n SEPARATOR ',') g FROM (
    SELECT FLOOR(RAND() * (
        SELECT id FROM tbl ORDER BY id DESC LIMIT 1
    )) n FROM tbl LIMIT 10) a

...
SELECT * FROM tbl WHERE id IN ($result);

FYI: To get 10 random rows from a 200k table, it took me 1.78 ms (including all the operations in the php side)

Impenitent answered 15/5, 2015 at 11:5 Comment(1)
Suggest you increase the LIMIT slightly -- you can get duplicates.Steiger
E
3

You can easily use a random offset with a limit

PREPARE stm from 'select * from table limit 10 offset ?';
SET @total = (select count(*) from table);
SET @_offset = FLOOR(RAND() * @total);
EXECUTE stm using @_offset;

You can also apply a where clause like so

PREPARE stm from 'select * from table where available=true limit 10 offset ?';
SET @total = (select count(*) from table where available=true);
SET @_offset = FLOOR(RAND() * @total);
EXECUTE stm using @_offset;

Tested on 600,000 rows (700MB) table query execution took ~0.016sec HDD drive.

EDIT: The offset might take a value close to the end of the table, which will result in the select statement returning less rows (or maybe only 1 row), to avoid this we can check the offset again after declaring it, like so

SET @rows_count = 10;
PREPARE stm from "select * from table where available=true limit ? offset ?";
SET @total = (select count(*) from table where available=true);
SET @_offset = FLOOR(RAND() * @total);
SET @_offset = (SELECT IF(@total-@_offset<@rows_count,@_offset-@rows_count,@_offset));
SET @_offset = (SELECT IF(@_offset<0,0,@_offset));
EXECUTE stm using @rows_count,@_offset;
Eastbourne answered 30/1, 2020 at 8:30 Comment(0)
B
2

If you want one random record (no matter if there are gapes between ids):

PREPARE stmt FROM 'SELECT * FROM `table_name` LIMIT 1 OFFSET ?';
SET @count = (SELECT
        FLOOR(RAND() * COUNT(*))
    FROM `table_name`);

EXECUTE stmt USING @count;

Source: https://www.warpconduit.net/2011/03/23/selecting-a-random-record-using-mysql-benchmark-results/#comment-1266

Bigwig answered 12/7, 2017 at 23:37 Comment(1)
for mysql 8: [HY000][1210] Incorrect arguments to EXECUTEDesiree
P
2

This is super fast and is 100% random even if you have gaps.

  1. Count the number x of rows that you have available SELECT COUNT(*) as rows FROM TABLE
  2. Pick 10 distinct random numbers a_1,a_2,...,a_10 between 0 and x
  3. Query your rows like this: SELECT * FROM TABLE LIMIT 1 offset a_i for i=1,...,10

I found this hack in the book SQL Antipatterns from Bill Karwin.

Peres answered 21/2, 2019 at 14:55 Comment(5)
I was thinking about the same solution, please tell me, is it faster then the others method ?Cohn
@G.Adnane its not quicker or slower then the accepted answer, but the accepted answer assumes equal distribution of id's. I can't imagine any scenario where this can be guaranteed. This solution is in O(1) where the solution SELECT column FROM table ORDER BY RAND() LIMIT 10 is in O(nlog(n)). So yes, this is the fasted solution and it works for any distribution of ids.Peres
no, because in the link posted for the accepted solution, there is other methods, I want to know if this solution is faster then the others, other ways, we can try to find another, that's why Iam asking, any way, +1 for your answer. I was using the samethingCohn
there is a case when you want to get x number of rows but the offset goes to the end of the table which will return <x rows or only 1 row. i didn't see your answer before i posted mine but i made it more clear here https://mcmap.net/q/65469/-mysql-select-10-random-rows-from-600k-rows-fastEastbourne
@ZOLDIK it seems that you pick the first 10 rows after offset x. I would argue that this is not a random generation of 10 rows. In my answer, you have to execute the query in step three 10 times, i.e. one only gets one row per execution and don't have to worry if the offset is at the end of the table.Peres
C
2

The following should be fast, unbiased and independent of id column. However it does not guarantee that the number of rows returned will match the number of rows requested.

SELECT *
FROM t
WHERE RAND() < (SELECT 10 / COUNT(*) FROM t)

Explanation: assuming you want 10 rows out of 100 then each row has 1/10 probability of getting SELECTed which could be achieved by WHERE RAND() < 0.1. This approach does not guarantee 10 rows; but if the query is run enough times the average number of rows per execution will be around 10 and each row in the table will be selected evenly.

Crawly answered 30/4, 2019 at 8:38 Comment(0)
C
1

If you have just one Read-Request

Combine the answer of @redsio with a temp-table (600K is not that much):

DROP TEMPORARY TABLE IF EXISTS tmp_randorder;
CREATE TABLE tmp_randorder (id int(11) not null auto_increment primary key, data_id int(11));
INSERT INTO tmp_randorder (data_id) select id from datatable;

And then take a version of @redsios Answer:

SELECT dt.*
FROM
       (SELECT (RAND() *
                     (SELECT MAX(id)
                        FROM tmp_randorder)) AS id)
        AS rnd
 INNER JOIN tmp_randorder rndo on rndo.id between rnd.id - 10 and rnd.id + 10
 INNER JOIN datatable AS dt on dt.id = rndo.data_id
 ORDER BY abs(rndo.id - rnd.id)
 LIMIT 1;

If the table is big, you can sieve on the first part:

INSERT INTO tmp_randorder (data_id) select id from datatable where rand() < 0.01;

If you have many read-requests

  1. Version: You could keep the table tmp_randorder persistent, call it datatable_idlist. Recreate that table in certain intervals (day, hour), since it also will get holes. If your table gets really big, you could also refill holes

    select l.data_id as whole from datatable_idlist l left join datatable dt on dt.id = l.data_id where dt.id is null;

  2. Version: Give your Dataset a random_sortorder column either directly in datatable or in a persistent extra table datatable_sortorder. Index that column. Generate a Random-Value in your Application (I'll call it $rand).

    select l.*
    from datatable l 
    order by abs(random_sortorder - $rand) desc 
    limit 1;
    

This solution discriminates the 'edge rows' with the highest and the lowest random_sortorder, so rearrange them in intervals (once a day).

Cease answered 7/5, 2014 at 6:2 Comment(0)
C
1

Another simple solution would be ranking the rows and fetch one of them randomly and with this solution you won't need to have any 'Id' based column in the table.

SELECT d.* FROM (
SELECT  t.*,  @rownum := @rownum + 1 AS rank
FROM mytable AS t,
    (SELECT @rownum := 0) AS r,
    (SELECT @cnt := (SELECT RAND() * (SELECT COUNT(*) FROM mytable))) AS n
) d WHERE rank >= @cnt LIMIT 10;

You can change the limit value as per your need to access as many rows as you want but that would mostly be consecutive values.

However, if you don't want consecutive random values then you can fetch a bigger sample and select randomly from it. something like ...

SELECT * FROM (
SELECT d.* FROM (
    SELECT  c.*,  @rownum := @rownum + 1 AS rank
    FROM buildbrain.`commits` AS c,
        (SELECT @rownum := 0) AS r,
        (SELECT @cnt := (SELECT RAND() * (SELECT COUNT(*) FROM buildbrain.`commits`))) AS rnd
) d 
WHERE rank >= @cnt LIMIT 10000 
) t ORDER BY RAND() LIMIT 10;
Cowans answered 9/11, 2015 at 13:37 Comment(0)
D
1

One way that i find pretty good if there's an autogenerated id is to use the modulo operator '%'. For Example, if you need 10,000 random records out 70,000, you could simplify this by saying you need 1 out of every 7 rows. This can be simplified in this query:

SELECT * FROM 
    table 
WHERE 
    id % 
    FLOOR(
        (SELECT count(1) FROM table) 
        / 10000
    ) = 0;

If the result of dividing target rows by total available is not an integer, you will have some extra rows than what you asked for, so you should add a LIMIT clause to help you trim the result set like this:

SELECT * FROM 
    table 
WHERE 
    id % 
    FLOOR(
        (SELECT count(1) FROM table) 
        / 10000
    ) = 0
LIMIT 10000;

This does require a full scan, but it is faster than ORDER BY RAND, and in my opinion simpler to understand than other options mentioned in this thread. Also if the system that writes to the DB creates sets of rows in batches you might not get such a random result as you where expecting.

Dowery answered 22/6, 2016 at 13:22 Comment(1)
Now that i think so, if you need random rows every time you call it, this is useless. I was only thinking about the need to get random rows from a set to do some research. I still think modulo is a good thing to help in the other case. You could use modulo as a first pass filter to lower the cost of an ORDER BY RAND operation.Dowery
T
0

I think here is a simple and yet faster way, I tested it on the live server in comparison with a few above answer and it was faster.

 SELECT * FROM `table_name` WHERE id >= (SELECT FLOOR( MAX(id) * RAND()) FROM `table_name` ) ORDER BY id LIMIT 30; 

//Took 0.0014secs against a table of 130 rows

SELECT * FROM `table_name` WHERE 1 ORDER BY RAND() LIMIT 30

//Took 0.0042secs against a table of 130 rows

 SELECT name
FROM random AS r1 JOIN
   (SELECT CEIL(RAND() *
                 (SELECT MAX(id)
                    FROM random)) AS id)
    AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 30

//Took 0.0040secs against a table of 130 rows

Tenant answered 18/6, 2020 at 3:28 Comment(0)
S
0
SELECT
  * 
FROM
  table_with_600k_rows
WHERE
  RAND( ) 
ORDER BY
  id DESC 
LIMIT 30;

id is the primary key, sorted by id, EXPLAIN table_with_600k_rows, find that row does not scan the entire table

Sorrells answered 2/8, 2021 at 8:30 Comment(0)
B
0

What about retrieving rows up and down, joining them and then ordering by rand?

SELECT x.id FROM table_x x
INNER JOIN (
  (SELECT id FROM table_x x WHERE x.id >= :id ORDER BY x.id ASC LIMIT :amount) UNION
  (SELECT id FROM table_x x WHERE x.id <= :id ORDER BY x.id DESC LIMIT :amount) 
) u ON u.id = x.id
ORDER BY RAND() LIMIT :amount

This way we are eliminating issues with holes in database, mitigating the problem of too high or too low random value (ID in this example) resulting in no rows found, minimizing the problem with some rows having bigger chance to be selected (I don't think this eliminates this problem but make it a little more fair compared to some other answers) and eliminating the problem of multiple queries to retrieve more than one row.

The issue is the LIMIT approach where we are randomizing on the selected cluster instead whole database if we want to retrieve more than one row at the time. We are still randomly selecting cluster and randomizing its content, so for the same ID we will get different result most of the time.

Another thing is your preference of the cluster size. In this example, if you want one row, the script will select one or two rows, put them in random order and select one from the top. But for your usage you could decide on static cluster size (like 200) and add to it the amount of rows to retrieve to make it more random:

  (SELECT id FROM table_x x WHERE x.id >= :id ORDER BY x.id ASC LIMIT :amount + 200) UNION
  (SELECT id FROM table_x x WHERE x.id <= :id ORDER BY x.id DESC LIMIT :amount + 200) 

but now you are ordering by RAND 201-402+ rows (depending on the amount and selected ID) which is still good but requires more resources.

Note: Chosen ID must be between minial and maximal value in DB otherwise you might get zero results. Also, I think you should calculate min and max separate of this query and cache the results, then use it to generate random ID. This way query won't have to calculate min/max each time you make the call, saving on time.

Boxboard answered 12/5, 2023 at 13:38 Comment(0)
I
-2

I Use this query:

select floor(RAND() * (SELECT MAX(key) FROM table)) from table limit 10

query time:0.016s

Iolanthe answered 5/11, 2014 at 10:39 Comment(1)
Having PKs like 1,2,9,15. by above query you will get rows like 4, 7, 14, 11 which are insufficient!Bigwig
C
-3

This is how I do it:

select * 
from table_with_600k_rows
where rand() < 10/600000
limit 10

I like it because does not require other tables, it is simple to write, and it is very fast to execute.

Chinaware answered 15/2, 2013 at 14:57 Comment(1)
That's full table scan and it does not use any indexes. For large tables and busy environment that's big no no.Stepaniestepbrother
T
-4

Use the below simple query to get random data from a table.

SELECT user_firstname ,
COUNT(DISTINCT usr_fk_id) cnt
FROM userdetails 
GROUP BY usr_fk_id 
ORDER BY cnt ASC  
LIMIT 10
Toponym answered 24/2, 2015 at 6:12 Comment(2)
If you want to use any join statement and where filter you can use.Toponym
From which part of the query you get the random-ness?Ryals
P
-8

I guess this is the best possible way..

SELECT id, id * RAND( ) AS random_no, first_name, last_name
FROM user
ORDER BY random_no
Pharmaceutical answered 4/4, 2013 at 16:15 Comment(2)
Hell no, that's one of worst ways to get random rows from table. That's full table scan + filesort + tmp table = bad performance.Stepaniestepbrother
Besides performance, it's also far from perfectly random; you're ordering by the product of the id and a random number, rather than simply ordering by a random number, which means that rows with lower ids are going to be biased towards appearing earlier in your results set.Grumble

© 2022 - 2024 — McMap. All rights reserved.