Quickly select random ID from mysql table with millions of non-sequential records
Asked Answered
I

3

13

I've looked around and there doesnt seem to be any easy way to do this. It almost looks like it's easier just to grab a subset of records and do all the randomizing in code (perl). The methods I've seen online seem like theyre geared more to at most hundreds of thousands, but certainly not millions.

The table I'm working with has 6 million records (and growing), the IDs are auto incremented, but not always stored in the table (non-gapless).

I've tried to do the LIMIT 1 query that's been recommended, but the query takes forever to run -- is there a quick way to do this, given that there are gaps in the record? I can't just take the max and randomize over the range.

Update:

One idea I had maybe was to grab the max, randomize a limit based on the max, and then grab a range of 10 records from random_limit_1 to random_limit_2 and then taking the first record found in that range.

Or if I know the max, is there a way i can just pick say the 5th record of the table, without having to know which ID it is. Then just grabbing the id of that record.

Update:

This query is somewhat faster-ish. Still not fast enough =/

SELECT t.id FROM table t JOIN (SELECT(FLOOR(max(id) * rand())) as maxid FROM table) as tt on t.id >= tt.maxid LIMIT 1
Iredale answered 9/12, 2011 at 17:53 Comment(3)
What do you mean by non-gapless? That there are gaps?Egis
pretty sure nodebunny means a typical autoincrement index that has had some rows deleted in the past...Malmsey
The Update query in your post runs in about 0.005 seconds total on my system, how fast were you hoping this would run?Malmsey
M
9

Yeah, idea seems good:

select min(ID), max(ID) from table into @min, @max;
set @range = @max - @min;
set @mr = @min + ((@range / 1000) * (rand() * 1000));
select ID from table
  where ID >= @mr and ID <= @mr + 1000
  order by rand()
  limit 1
--   into @result
;

May change 1000 to 10000 or whatever as needed to scale...

EDIT: you could also try this:

select ID from table
  where (ID % 1000) = floor(rand() * 1000)
  order by rand()
  limit 1
;

Splits it along different lines...

EDIT 2:

See: What is the best way to pick a random row from a table in MySQL?

This is probably the fastest way:

select @row := floor(count(*) * rand()) from some_tbl;
select some_ID from some_tbl limit @row, 1;

unfortunately, variables can't be used in limit clause so you'd have to use a dynamic query, either writing the query string in code, or using PREPARE and EXECUTE. Also, limit n, 1 still requires scanning n items into the table, so it's only about twice as fast as the second method listed above on average. (Though it is probably more uniform and guarantees a matching row will always be found)

Malmsey answered 9/12, 2011 at 18:17 Comment(2)
I ran this query but it returned an empty set.Iredale
If there are gaps wider than 1000 records, that could happen... What's the distribution of your ID's like?Malmsey
K
14
SELECT * FROM TABLE ORDER BY RAND() LIMIT 1;

Ok, this is slow. If you'll search for ORDER BY RAND() MYSQL, you will find alot of results saying that this is very slow and this is the case. I did a little research and I found this alternative MySQL rand() is slow on large datasets I hope this is better

Kinakinabalu answered 9/12, 2011 at 17:55 Comment(3)
Yeah I tried that but it takes FOREVERS for the query to run even on Limit 1.Iredale
the query on that link SELECT t.id FROM table t JOIN (SELECT(FLOOR(max(id) * rand())) as maxid FROM table) as tt on t.id >= tt.maxid LIMIT 1 is slightly faster -- still too slow =/Iredale
I don't think that the linked join query would give very uniform random results unless an order by ID were added?Malmsey
M
9

Yeah, idea seems good:

select min(ID), max(ID) from table into @min, @max;
set @range = @max - @min;
set @mr = @min + ((@range / 1000) * (rand() * 1000));
select ID from table
  where ID >= @mr and ID <= @mr + 1000
  order by rand()
  limit 1
--   into @result
;

May change 1000 to 10000 or whatever as needed to scale...

EDIT: you could also try this:

select ID from table
  where (ID % 1000) = floor(rand() * 1000)
  order by rand()
  limit 1
;

Splits it along different lines...

EDIT 2:

See: What is the best way to pick a random row from a table in MySQL?

This is probably the fastest way:

select @row := floor(count(*) * rand()) from some_tbl;
select some_ID from some_tbl limit @row, 1;

unfortunately, variables can't be used in limit clause so you'd have to use a dynamic query, either writing the query string in code, or using PREPARE and EXECUTE. Also, limit n, 1 still requires scanning n items into the table, so it's only about twice as fast as the second method listed above on average. (Though it is probably more uniform and guarantees a matching row will always be found)

Malmsey answered 9/12, 2011 at 18:17 Comment(2)
I ran this query but it returned an empty set.Iredale
If there are gaps wider than 1000 records, that could happen... What's the distribution of your ID's like?Malmsey
S
1
SELECT ID
    FROM YourTable
    ORDER BY RAND() LIMIT 1;
Serotonin answered 9/12, 2011 at 17:56 Comment(1)
try running this query on 6 million records. it takes forever to run.Iredale

© 2022 - 2024 — McMap. All rights reserved.