How to get records randomly from the Oracle database?
Asked Answered
A

10

104

I need to select rows randomly from an Oracle DB.

Ex: Assume a table with 100 rows, how I can randomly return 20 of those records from the entire 100 rows.

Aleta answered 26/3, 2012 at 7:50 Comment(0)
O
146
SELECT *
FROM   (
    SELECT *
    FROM   table
    ORDER BY DBMS_RANDOM.RANDOM)
WHERE  rownum < 21;
Ovular answered 26/3, 2012 at 7:57 Comment(7)
Beat me to it. This will however only select the first 20 rows from the table and order them randomly.Observance
Ah - I thought you'd have to INITIALIZE() in that case.Bun
You have to be aware that this is a very heavy operation on large tables, because it will first assign a random number to EACH row, then sort on this value and then take some records from it.Dismiss
@NishantSharma, the rows are randomised, then limited - your comment isn't correct.Medicable
This approach is VERY slowUplift
@EvanKroske what alternative approach would you suggest?Polymeric
@JonBetts, I think that sample is much faster and more resource-efficient: https://mcmap.net/q/204621/-how-to-get-records-randomly-from-the-oracle-databaseUplift
T
68

SAMPLE() is not guaranteed to give you exactly 20 rows, but might be suitable (and may perform significantly better than a full query + sort-by-random for large tables):

SELECT *
FROM   table SAMPLE(20);

Note: the 20 here is an approximate percentage, not the number of rows desired. In this case, since you have 100 rows, to get approximately 20 rows you ask for a 20% sample.

Tynishatynwald answered 29/3, 2012 at 6:52 Comment(3)
sample is fast but doesn't appear to be very random. records towards the top/beginning of the table tend to be favored.Mizzenmast
that'll happen if you stop the query before it gets through the whole table.Tynishatynwald
Sorry i made a mistake, your post is fine and the results are equally distributed. It's when you add "where rownum <= 20" in combination with sample(20) that the data starts to become less random.Mizzenmast
N
14
SELECT * FROM table SAMPLE(10) WHERE ROWNUM <= 20;

This is more efficient as it doesn't need to sort the Table.

Nguyetni answered 2/4, 2014 at 20:35 Comment(1)
Stopping the sample after 20 rows will result in non-random results (rows found earlier in the table will be returned far more often than later ones). Also, this is not guaranteed to return 20 rows.Tynishatynwald
B
10
SELECT column FROM
( SELECT column, dbms_random.value FROM table ORDER BY 2 )
where rownum <= 20;
Beef answered 26/3, 2012 at 8:1 Comment(0)
L
8

In case of huge tables standard way with sorting by dbms_random.value is not effective because you need to scan whole table and dbms_random.value is pretty slow function and requires context switches. For such cases, there are 3 additional methods:


1: Use sample clause:

for example:

select *
from s1 sample block(1)
order by dbms_random.value
fetch first 1 rows only

ie get 1% of all blocks, then sort them randomly and return just 1 row.


2: if you have an index/primary key on the column with normal distribution, you can get min and max values, get random value in this range and get first row with a value greater or equal than that randomly generated value.

Example:

--big table with 1 mln rows with primary key on ID with normal distribution:
Create table s1(id primary key,padding) as 
   select level, rpad('x',100,'x')
   from dual 
   connect by level<=1e6;

select *
from s1 
where id>=(select 
              dbms_random.value(
                 (select min(id) from s1),
                 (select max(id) from s1) 
              )
           from dual)
order by id
fetch first 1 rows only;

3: get random table block, generate rowid and get row from the table by this rowid:

select * 
from s1
where rowid = (
   select
      DBMS_ROWID.ROWID_CREATE (
         1, 
         objd,
         file#,
         block#,
         1) 
   from    
      (
      select/*+ rule */ file#,block#,objd
      from v$bh b
      where b.objd in (select o.data_object_id from user_objects o where object_name='S1' /* table_name */)
      order by dbms_random.value
      fetch first 1 rows only
      )
);
Lucey answered 14/7, 2020 at 9:48 Comment(1)
the "sample" example is excellent!Blenny
M
6

In summary, two ways were introduced

1) using order by DBMS_RANDOM.VALUE clause
2) using sample([%]) function

The first way has advantage in 'CORRECTNESS' which means you will never fail get result if it actually exists, while in the second way you may get no result even though it has cases satisfying the query condition since information is reduced during sampling.

The second way has advantage in 'EFFICIENT' which mean you will get result faster and give light load to your database. I was given an warning from DBA that my query using the first way gives loads to the database

You can choose one of two ways according to your interest!

Mudslinger answered 4/10, 2019 at 4:50 Comment(0)
O
5

To randomly select 20 rows I think you'd be better off selecting the lot of them randomly ordered and selecting the first 20 of that set.

Something like:

Select *
  from (select *
          from table
         order by dbms_random.value) -- you can also use DBMS_RANDOM.RANDOM
 where rownum < 21;

Best used for small tables to avoid selecting large chunks of data only to discard most of it.

Observance answered 26/3, 2012 at 8:7 Comment(0)
X
0

Here's how to pick a random sample out of each group:

SELECT GROUPING_COLUMN, 
       MIN (COLUMN_NAME) KEEP (DENSE_RANK FIRST ORDER BY DBMS_RANDOM.VALUE) 
         AS RANDOM_SAMPLE
FROM TABLE_NAME
GROUP BY GROUPING_COLUMN
ORDER BY GROUPING_COLUMN;

I'm not sure how efficient it is, but if you have a lot of categories and sub-categories, this seems to do the job nicely.

Xylina answered 5/6, 2020 at 21:7 Comment(0)
C
0

-- Q. How to find Random 50% records from table ?

when we want percent wise randomly data

SELECT * FROM ( SELECT * FROM table_name ORDER BY DBMS_RANDOM.RANDOM) WHERE rownum <= (select count(*) from table_name) * 50/100;

Circumference answered 6/1, 2023 at 3:27 Comment(1)
if you want only 20 record randomly then use below query : SELECT * FROM ( SELECT * FROM table_name ORDER BY DBMS_RANDOM.RANDOM) WHERE rownum <=20;Circumference
S
-1

select * from (SELECT * FROM emp ORDER BY DBMS_RANDOM.RANDOM )where
rownum <= 1 ;

Slating answered 22/11, 2023 at 8:29 Comment(1)
This is just an unformatted copy of the accepted answer with a different table name and a different row number. Please don't repeat answers.Retiform

© 2022 - 2024 — McMap. All rights reserved.