MySQL: Alternatives to ORDER BY RAND()
Asked Answered
T

9

63

I've read about a few alternatives to MySQL's ORDER BY RAND() function, but most of the alternatives apply only to where on a single random result is needed.

Does anyone have any idea how to optimize a query that returns multiple random results, such as this:

   SELECT u.id, 
          p.photo 
     FROM users u, profiles p 
    WHERE p.memberid = u.id 
      AND p.photo != '' 
      AND (u.ownership=1 OR u.stamp=1) 
 ORDER BY RAND() 
    LIMIT 18 
Thais answered 1/12, 2009 at 0:27 Comment(8)
I don't understand what you're looking for. Why isn't ORDER BY RAND() suitable? Are you mainly concerned with efficiency?Amphichroic
Yes that's right. I haven't reached even close to the scale presented in your graph and I was already taking a hit.Thais
@outis: Because it doesn't scale - see: dasprids.de/blog/2008/06/07/…Ricebird
I wrote an article about a solution about a year go: devzone.zend.com/article/…Cesya
possible duplicate of What is the best way to pick a random row from a table in MySQL?Amphichroic
Same as: #211829Duppy
blog.statvoo.com/post/113967722206/…Hesitation
Possible duplicate of How can i optimize MySQL's ORDER BY RAND() function?Handknit
R
29

UPDATE 2016

This solution works best using an indexed column.

Here is a simple example of and optimized query bench marked with 100,000 rows.

OPTIMIZED: 300ms

SELECT 
    g.*
FROM
    table g
        JOIN
    (SELECT 
        id
    FROM
        table
    WHERE
        RAND() < (SELECT 
                ((4 / COUNT(*)) * 10)
            FROM
                table)
    ORDER BY RAND()
    LIMIT 4) AS z ON z.id= g.id

note about limit ammount: limit 4 and 4/count(*). The 4s need to be the same number. Changing how many you return doesn't effect the speed that much. Benchmark at limit 4 and limit 1000 are the same. Limit 10,000 took it up to 600ms

note about join: Randomizing just the id is faster than randomizing a whole row. Since it has to copy the entire row into memory then randomize it. The join can be any table that is linked to the subquery Its to prevent tablescans.

note where clause: The where count limits down the ammount of results that are being randomized. It takes a percentage of the results and sorts them rather than the whole table.

note sub query: The if doing joins and extra where clause conditions you need to put them both in the subquery and the subsubquery. To have an accurate count and pull back correct data.

UNOPTIMIZED: 1200ms

SELECT 
    g.*
FROM
    table g
ORDER BY RAND()
LIMIT 4

PROS

4x faster than order by rand(). This solution can work with any table with a indexed column.

CONS

It is a bit complex with complex queries. Need to maintain 2 code bases in the subqueries

Renarenado answered 15/3, 2016 at 14:20 Comment(2)
Very nice. I am going to be sure to use this.Sateia
Pulling a range of random ids could be even more useful if you take those ids and throw them into a caching layer for 10 seconds, then let the app select randomly from the ids in the caching layer.Donatello
R
21

Here's an alternative, but it is still based on using RAND():

  SELECT u.id, 
         p.photo,
         ROUND(RAND() * x.m_id) 'rand_ind'
    FROM users u, 
         profiles p,
         (SELECT MAX(t.id) 'm_id'
            FROM USERS t) x
   WHERE p.memberid = u.id 
     AND p.photo != '' 
     AND (u.ownership=1 OR u.stamp=1) 
ORDER BY rand_ind
   LIMIT 18

This is slightly more complex, but gave a better distribution of random_ind values:

  SELECT u.id, 
         p.photo,
         FLOOR(1 + RAND() * x.m_id) 'rand_ind'
    FROM users u, 
         profiles p,
         (SELECT MAX(t.id) - 1 'm_id'
            FROM USERS t) x
   WHERE p.memberid = u.id 
     AND p.photo != '' 
     AND (u.ownership=1 OR u.stamp=1) 
ORDER BY rand_ind
   LIMIT 18
Ricebird answered 1/12, 2009 at 2:51 Comment(9)
Outstanding OMG. Thank you so much. While my db is not huge, I already noticed a slight performance increase, and feel much more comfortable moving forward.Thais
This answer helped me a lot, as most other "solutions" I've found rules out complex WHERE clauses. Thanks!Redevelop
How multiplying RAND() to a constant value can give better distribution?Terylene
@zerkms: I can't speak for MySQL's optimizerRicebird
@OMG Ponies: Yep, but you advised that :-) So my question is: why ORDER BY RAND() is worse than ORDER BY RAND() * const?Terylene
I just tried selecting 10 random records on an InnoDB table of slightly over half a million records, and I didn't see any significant performance gains over just using order by rand().Florafloral
Still needs to create a RAND() value for each row, copy the whole data to a temp table and sort that.Kenwrick
These forms do not provide any optimization over ORDER BY RAND(). I just ran tests on a one million row table, to compare performance. Averaging the results of 5 runs (discarding the first run), a straight ORDER BY RAND() was actually 11.0% faster. (avg. 2.70 sec vs. 3.04 sec.).Sweepstake
This doesn't speed up the selection, it makes it slower. Also, the fetch becomes slower because you have to fetch the random number along with the data you wanted. It takes up more RAM because it keeps the randomized numbers after the order byRenarenado
N
9

It is not the fastest, but faster then common ORDER BY RAND() way:

ORDER BY RAND() is not so slow, when you use it to find only indexed column. You can take all your ids in one query like this:

SELECT id
FROM testTable
ORDER BY RAND();

to get a sequence of random ids, and JOIN the result to another query with other SELECT or WHERE parameters:

SELECT t.*
FROM testTable t
JOIN
    (SELECT id
    FROM `testTable`
    ORDER BY RAND()) AS z ON z.id= t.id   
WHERE t.isVisible = 1
LIMIT 100; 

in your case it would be:

SELECT u.id, p.photo 
FROM users u, profiles p 
JOIN
    (SELECT id
    FROM users
    ORDER BY RAND()) AS z ON z.id = u.id   
WHERE p.memberid = u.id 
  AND p.photo != '' 
  AND (u.ownership=1 OR u.stamp=1) 
LIMIT 18 

It's very blunt method and it can be not proper with very big tables, but still it's faster than common RAND(). I got 20 times faster execution time searching 3000 random rows in almost 400000.

Nape answered 24/11, 2016 at 7:18 Comment(0)
T
3

Order by rand() is very slow on large tables,

I found the following workaround in a php script:

Select min(id) as min, max(id) as max from table;

Then do random in php

$rand = rand($min, $max);

Then

'Select * from table where id>'.$rand.' limit 1';

Seems to be quite fast....

Tratner answered 3/11, 2016 at 9:47 Comment(2)
Smart solution for large tables. However, WHERE id > '.$rand.' might return nothing if $rand happens to be max(id) so WHERE id >= '.$rand.' would be betterDaddy
Gaps in the indexes can lead to biased results. If there are 6 records with ids 1,2,3,10,11,12, then the record with id 10 is way more likely to be picked.Goethe
H
1

I ran into this today and was trying to use 'DISTINCT' along with JOINs, but was getting duplicates I assume because the RAND was making each JOINed row distinct. I muddled around a bit and found a solution that works, like this:

SELECT DISTINCT t.id, 
                t.photo 
       FROM (SELECT  u.id, 
                     p.photo,
                     RAND() as rand
                FROM users u, profiles p 
                 WHERE p.memberid = u.id 
                  AND p.photo != '' 
                  AND (u.ownership=1 OR u.stamp=1)
                ORDER BY rand) t
       LIMIT 18
Herminiahermione answered 16/1, 2013 at 0:32 Comment(2)
This seems the exact same thing MySql does when you use ORDER BY RAND().Morphogenesis
i tested it and if you have a rand value in your result set (as is done in OMG Ponies' solutions), DISTINCT becomes negated. So this was how I got around that.Herminiahermione
E
1

Create a column or join to a select with random numbers (generated in for example php) and order by this column.

Exotoxin answered 15/2, 2015 at 18:35 Comment(1)
This is similar to XKCD's getRandomNumber. This will yield the same "random" results over and over, which is usually not what they're looking for.Cavitation
P
0

The solution I am using is also posted in the link below: How can i optimize MySQL's ORDER BY RAND() function?

I am assuming your users table is going to be larger than your profiles table, if not then it's 1 to 1 cardinality.

If so, I would first do a random selection on user table before joining with profile table.

First do selection:

SELECT *
FROM users
WHERE users.ownership = 1 OR users.stamp = 1

Then from this pool, pick out random rows through calculated probability. If your table has M rows and you want to pick out N random rows, the probability of random selection should be N/M. Hence:

SELECT *
FROM
(
    SELECT *
    FROM users
    WHERE users.ownership = 1 OR users.stamp = 1
) as U
WHERE 
    rand() <= $limitCount / (SELECT count(*) FROM users WHERE users.ownership = 1 OR users.stamp = 1)

Where N is $limitCount and M is the subquery that calculates the table row count. However, since we are working on probability, it is possible to have LESS than $limitCount of rows returned. Therefore we should multiply N by a factor to increase the random pool size.

i.e:

SELECT*
FROM
(
    SELECT *
    FROM users
    WHERE users.ownership = 1 OR users.stamp = 1
) as U
WHERE 
    rand() <= $limitCount * $factor / (SELECT count(*) FROM users WHERE users.ownership = 1 OR users.stamp = 1)

I usually set $factor = 2. You can set the factor to a lower value to further reduce the random pool size (e.g. 1.5).

At this point, we would have already limited a M size table down to roughly 2N size. From here we can do a JOIN then LIMIT.

SELECT * 
FROM
(
       SELECT *
        FROM
        (
            SELECT *
            FROM users
            WHERE users.ownership = 1 OR users.stamp = 1
        ) as U
        WHERE 
            rand() <= $limitCount * $factor / (SELECT count(*) FROM users WHERE users.ownership = 1 OR users.stamp = 1)
) as randUser
JOIN profiles
ON randUser.id = profiles.memberid AND profiles.photo != ''
LIMIT $limitCount

On a large table, this query will outperform a normal ORDER by RAND() query.

Hope this helps!

Pulchritude answered 17/9, 2014 at 10:4 Comment(0)
S
0
SELECT
    a.id,
    mod_question AS modQuestion,
    mod_answers AS modAnswers 
FROM
    b_ask_material AS a
    INNER JOIN ( SELECT id FROM b_ask_material WHERE industry = 2 ORDER BY RAND( ) LIMIT 100 ) AS b ON a.id = b.id
Stearn answered 1/12, 2021 at 6:43 Comment(1)
Please add some explanation to your answer such that others can learn from itLethe
H
0

I had the same issue today, I fixed it by using limit and offset You can do by iterating 18 times over a random set of offsets

  • to avoid duplicates, you can create your random set of offset like this in python sample(range(1, rows_count), random_rows_count)
  • then for each offset get the corresponding row using OFFSET and LIMIT 1 and add it to a list
  • rows_count can be cached to avoid performance issue to count total number of rows at each request

EDIT it's actually what this post https://mcmap.net/q/319615/-mysql-alternatives-to-order-by-rand proposes

Homopterous answered 29/6, 2022 at 18:59 Comment(2)
This does not provide a new answer to the question. This should have been at most a comment on this answer. Once you have sufficient reputation you will be able to comment on any post. - From ReviewGastropod
@Gastropod as you pointed I don't have enough reputation to comment but actually it provides additional information as no one else proposed to use offsets randomly picked.Homopterous

© 2022 - 2024 — McMap. All rights reserved.