How can i optimize MySQL's ORDER BY RAND() function?
Asked Answered
A

9

92

I'd like to optimize my queries so I look into mysql-slow.log.

Most of my slow queries contains ORDER BY RAND(). I cannot find a real solution to resolve this problem. Theres is a possible solution at MySQLPerformanceBlog but I don't think this is enough. On poorly optimized (or frequently updated, user managed) tables it doesn't work or I need to run two or more queries before I can select my PHP-generated random row.

Is there any solution for this issue?

A dummy example:

SELECT  accomodation.ac_id,
        accomodation.ac_status,
        accomodation.ac_name,
        accomodation.ac_status,
        accomodation.ac_images
FROM    accomodation, accomodation_category
WHERE   accomodation.ac_status != 'draft'
        AND accomodation.ac_category = accomodation_category.acat_id
        AND accomodation_category.acat_slug != 'vendeglatohely'
        AND ac_images != 'b:0;'
ORDER BY
        RAND()
LIMIT 1
Agrestic answered 7/8, 2009 at 12:55 Comment(1)
Possible duplicate of MySQL select 10 random rows from 600K rows fastHypomania
S
67

Try this:

SELECT  *
FROM    (
        SELECT  @cnt := COUNT(*) + 1,
                @lim := 10
        FROM    t_random
        ) vars
STRAIGHT_JOIN
        (
        SELECT  r.*,
                @lim := @lim - 1
        FROM    t_random r
        WHERE   (@cnt := @cnt - 1)
                AND RAND(20090301) < @lim / @cnt
        ) i

This is especially efficient on MyISAM (since the COUNT(*) is instant), but even in InnoDB it's 10 times more efficient than ORDER BY RAND().

The main idea here is that we don't sort, but instead keep two variables and calculate the running probability of a row to be selected on the current step.

See this article in my blog for more detail:

Update:

If you need to select but a single random record, try this:

SELECT  aco.*
FROM    (
        SELECT  minid + FLOOR((maxid - minid) * RAND()) AS randid
        FROM    (
                SELECT  MAX(ac_id) AS maxid, MIN(ac_id) AS minid
                FROM    accomodation
                ) q
        ) q2
JOIN    accomodation aco
ON      aco.ac_id =
        COALESCE
        (
        (
        SELECT  accomodation.ac_id
        FROM    accomodation
        WHERE   ac_id > randid
                AND ac_status != 'draft'
                AND ac_images != 'b:0;'
                AND NOT EXISTS
                (
                SELECT  NULL
                FROM    accomodation_category
                WHERE   acat_id = ac_category
                        AND acat_slug = 'vendeglatohely'
                )
        ORDER BY
                ac_id
        LIMIT   1
        ),
        (
        SELECT  accomodation.ac_id
        FROM    accomodation
        WHERE   ac_status != 'draft'
                AND ac_images != 'b:0;'
                AND NOT EXISTS
                (
                SELECT  NULL
                FROM    accomodation_category
                WHERE   acat_id = ac_category
                        AND acat_slug = 'vendeglatohely'
                )
        ORDER BY
                ac_id
        LIMIT   1
        )
        )

This assumes your ac_id's are distributed more or less evenly.

Shipman answered 7/8, 2009 at 13:1 Comment(12)
Hello, Quassnoi! First of all, thanks for your fast response! Maybe it's my fault but it's still unclear your solution. I'll update my original post with a concrete example and I'll be happy if you explain your solution on this example.Agrestic
there was a typo at "JOIN accomodation aco ON aco.id =" where aco.id really is aco.ac_id. on the other hand the corrected query didn't worked for me because it throws an error #1241 - Operand should contain 1 column(s) at the fifth SELECT (the fourth sub-select). I tried to find the problem with parenthesis (if i'm not wrong) but i cannot find the problem yet.Agrestic
@fabrik : try now. It would be really helpful if you posted the table scripts so that I could check them before posting.Shipman
Thanks, it works! :) Can you edit the JOIN ... ON aco.id part to JOIN ... ON aco.ac_id so i can accept your solution. Thanks again! A question: i wonder if possible this is a worse random like ORDER BY RAND()? Just because this query repeating some result(s) a lot of times.Agrestic
@fabrik: done. As for the randomness: yes, it's less random than ORDER BY RAND(). In fact, it selects random id between MIN and MAX , then selects the first id greater than that random, with wraparound. If you have a large gap, like 1, 2, 3, 10, 10 will be selected in 70% of cases. However, this is fastest way possible. If you want truly random solution, use the first query (just replace t_random with your query expressed as an inline view)Shipman
The linked blog article keeps referring to RAND(20090301) rather than RAND(); given that 2009/03/01 was the date the article was written, I'm assuming that this is a bug in the Wordpress setup causing an accidental substitution.Hyaloplasm
@Adam: no, that's intentional, so that you can reproduce the results.Shipman
@Quassnoi: Ah! It's a seed value. I see - thanks. Ideally the article would have made this clear though.Hyaloplasm
In the first example, it seems like rows are returned in order of primary key? Or perhaps in the order they were added to the table? Any way around this without killing the performance?Eckstein
To shuffle, simple wrap with an extra ORDER BY RAND(). The cost will be roughly proportional to the number of rows.Rhinology
The first query in this "Answer" still requires a table scan.Rhinology
@RickJames: it does indeed. May I ask why did you quote the word "answer" in your comment? Thanks!Shipman
N
12

It depends on how random you need to be. The solution you linked works pretty well IMO. Unless you have large gaps in the ID field, it's still pretty random.

However, you should be able to do it in one query using this (for selecting a single value):

SELECT [fields] FROM [table] WHERE id >= FLOOR(RAND()*MAX(id)) LIMIT 1

Other solutions:

  • Add a permanent float field called random to the table and fill it with random numbers. You can then generate a random number in PHP and do "SELECT ... WHERE rnd > $random"
  • Grab the entire list of IDs and cache them in a text file. Read the file and pick a random ID from it.
  • Cache the results of the query as HTML and keep it for a few hours.
Nietzsche answered 8/8, 2009 at 23:31 Comment(3)
Is it just me or this query doesn't work? I tried it with several variations and they all throw "Invalid use of group function"..Zealotry
You can do it with a subquery SELECT [fields] FROM [table] WHERE id >= FLOOR(RAND()*(SELECT MAX(id) FROM [table])) LIMIT 1 but this doesn't seem to work properly since it never returns the last recordPhthisis
SELECT [fields] FROM [table] WHERE id >= FLOOR(1 + RAND()*(SELECT MAX(id) FROM [table])) LIMIT 1 Seems to be doing the trick for mePhthisis
E
1

Here's how I'd do it:

SET @r := (SELECT ROUND(RAND() * (SELECT COUNT(*)
  FROM    accomodation a
  JOIN    accomodation_category c
    ON (a.ac_category = c.acat_id)
  WHERE   a.ac_status != 'draft'
        AND c.acat_slug != 'vendeglatohely'
        AND a.ac_images != 'b:0;';

SET @sql := CONCAT('
  SELECT  a.ac_id,
        a.ac_status,
        a.ac_name,
        a.ac_status,
        a.ac_images
  FROM    accomodation a
  JOIN    accomodation_category c
    ON (a.ac_category = c.acat_id)
  WHERE   a.ac_status != ''draft''
        AND c.acat_slug != ''vendeglatohely''
        AND a.ac_images != ''b:0;''
  LIMIT ', @r, ', 1');

PREPARE stmt1 FROM @sql;

EXECUTE stmt1;
Ebullition answered 7/8, 2009 at 21:0 Comment(5)
See also #211829Ebullition
my table isn't continuous because it's often edited. for example currently the first id is 121.Agrestic
The technique above does not rely on the id values being continuous. It chooses a random number between 1 and COUNT(*), not 1 and MAX(id) like some other solutions.Ebullition
Using OFFSET (which is what @r is for) does not avoid a scan -- up to a full table scan.Rhinology
@RickJames, that's right. If I were to answer this question today, I'd do the query by primary key. Using an offset with LIMIT does scan a lot of rows. Querying by primary key, although much faster, doesn't guarantee an even chance of choosing each row -- it favors rows that follow gaps.Ebullition
R
1

(Yeah, I will get dinged for not having enough meat here, but can't you be a vegan for one day?)

Case: Consecutive AUTO_INCREMENT without gaps, 1 row returned
Case: Consecutive AUTO_INCREMENT without gaps, 10 rows
Case: AUTO_INCREMENT with gaps, 1 row returned
Case: Extra FLOAT column for randomizing
Case: UUID or MD5 column

Those 5 cases can be made very efficient for large tables. See my blog for the details.

Rhinology answered 1/2, 2016 at 1:31 Comment(0)
R
0

This will give you single sub query that will use the index to get a random id then the other query will fire getting your joined table.

SELECT  accomodation.ac_id,
        accomodation.ac_status,
        accomodation.ac_name,
        accomodation.ac_status,
        accomodation.ac_images
FROM    accomodation, accomodation_category
WHERE   accomodation.ac_status != 'draft'
        AND accomodation.ac_category = accomodation_category.acat_id
        AND accomodation_category.acat_slug != 'vendeglatohely'
        AND ac_images != 'b:0;'
AND accomodation.ac_id IS IN (
        SELECT accomodation.ac_id FROM accomodation ORDER BY RAND() LIMIT 1
)
Ream answered 12/10, 2011 at 0:31 Comment(0)
L
0

The solution for your dummy-example would be:

SELECT  accomodation.ac_id,
        accomodation.ac_status,
        accomodation.ac_name,
        accomodation.ac_status,
        accomodation.ac_images
FROM    accomodation,
        JOIN 
            accomodation_category 
            ON accomodation.ac_category = accomodation_category.acat_id
        JOIN 
            ( 
               SELECT CEIL(RAND()*(SELECT MAX(ac_id) FROM accomodation)) AS ac_id
            ) AS Choices 
            USING (ac_id)
WHERE   accomodation.ac_id >= Choices.ac_id 
        AND accomodation.ac_status != 'draft'
        AND accomodation_category.acat_slug != 'vendeglatohely'
        AND ac_images != 'b:0;'
LIMIT 1

To read more about alternatives to ORDER BY RAND(), you should read this article.

Lattimer answered 20/3, 2012 at 18:23 Comment(0)
D
0

I am optimizing a lot of existing queries in my project. Quassnoi's solution has helped me speed up the queries a lot! However, I find it hard to incorporate the said solution in all queries, especially for complicated queries involving many subqueries on multiple large tables.

So I am using a less optimized solution. Fundamentally it works the same way as Quassnoi's solution.

SELECT  accomodation.ac_id,
        accomodation.ac_status,
        accomodation.ac_name,
        accomodation.ac_status,
        accomodation.ac_images
FROM    accomodation, accomodation_category
WHERE   accomodation.ac_status != 'draft'
        AND accomodation.ac_category = accomodation_category.acat_id
        AND accomodation_category.acat_slug != 'vendeglatohely'
        AND ac_images != 'b:0;'
        AND rand() <= $size * $factor / [accomodation_table_row_count]
LIMIT $size

$size * $factor / [accomodation_table_row_count] works out the probability of picking a random row. The rand() will generate a random number. The row will be selected if rand() is smaller or equals to the probability. This effectively performs a random selection to limit the table size. Since there is a chance it will return less than the defined limit count, we need to increase probability to ensure we are selecting enough rows. Hence we multiply $size by a $factor (I usually set $factor = 2, works in most cases). Finally we do the limit $size

The problem now is working out the accomodation_table_row_count. If we know the table size, we COULD hard code the table size. This would run the fastest, but obviously this is not ideal. If you are using Myisam, getting table count is very efficient. Since I am using innodb, I am just doing a simple count+selection. In your case, it would look like this:

SELECT  accomodation.ac_id,
        accomodation.ac_status,
        accomodation.ac_name,
        accomodation.ac_status,
        accomodation.ac_images
FROM    accomodation, accomodation_category
WHERE   accomodation.ac_status != 'draft'
        AND accomodation.ac_category = accomodation_category.acat_id
        AND accomodation_category.acat_slug != 'vendeglatohely'
        AND ac_images != 'b:0;'
        AND rand() <= $size * $factor / (select (SELECT count(*) FROM `accomodation`) * (SELECT count(*) FROM `accomodation_category`))
LIMIT $size

The tricky part is working out the right probability. As you can see the following code actually only calculates the rough temp table size (In fact, too rough!): (select (SELECT count(*) FROM accomodation) * (SELECT count(*) FROM accomodation_category)) But you can refine this logic to give a closer table size approximation. Note that it is better to OVER-select than to under-select rows. i.e. if the probability is set too low, you risk not selecting enough rows.

This solution runs slower than Quassnoi's solution since we need to recalculate the table size. However, I find this coding a lot more manageable. This is a trade off between accuracy + performance vs coding complexity. Having said that, on large tables this is still by far faster than Order by Rand().

Note: If the query logic permits, perform the random selection as early as possible before any join operations.

Disgust answered 17/9, 2014 at 5:24 Comment(0)
W
0

My recommendation is to add a column with a UUID (version 4) or other random value, with a unique index (or just the primary key).

Then you can simply generate a random value at query time and select rows greater than the generated value, ordering by the random column.

Make sure if you receive less than the expected number of rows, you repeat the query without the greater than clause (to select rows at the "beginning" of the result set).

uuid = generateUUIDV4()

select * from foo
where uuid > :uuid
order by uuid
limit 42

if count(results) < 42 {
  select * from foo
  order by uuid
  limit :remainingResultsRequired
}
Whacky answered 15/8, 2021 at 23:9 Comment(0)
H
-1
function getRandomRow(){
    $id = rand(0,NUM_OF_ROWS_OR_CLOSE_TO_IT);
    $res = getRowById($id);
    if(!empty($res))
    return $res;
    return getRandomRow();
}

//rowid is a key on table
function getRowById($rowid=false){

   return db select from table where rowid = $rowid; 
}
Heterosporous answered 6/11, 2017 at 1:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.