How do I select a fixed number of rows for each group?
Asked Answered
V

3

8

Here is some example data in a mysql table

a   b   distance
15  44  250
94  31  250
30  41  250
6   1   250
95  18  250
72  84  500
14  23  500
55  24  500
95  8   500
59  25  500
40  73  500
65  85  500
32  50  500
31  39  500
22  25  500
37  11  750
98  39  750
15  57  750
9   22  750
14  44  750
69  22  750
62  50  750
89  35  750
67  65  750
74  37  750
52  36  750
66  53  750
82  74  1000
79  22  1000
98  41  1000

How do I query this table such that I get 2 rows per distance selected at random?

A successful query will produce something like

   a    b   distance
    30  41  250
    95  18  250
    59  25  500
    65  85  500
    15  57  750
    89  35  750
    79  22  1000
    98  41  1000
Verenaverene answered 21/5, 2010 at 22:40 Comment(0)
V
7

Use:

SELECT x.a,
       x.b,
       x.distance
  FROM (SELECT t.a,
               t.b,
               t.distance
               CASE 
                 WHEN @distance != t.distance THEN @rownum := 1 
                 ELSE @rownum := @rownum + 1 
               END AS rank,
               @distance := t.distance
          FROM TABLE t
          JOIN (SELECT @rownum := 0, @distance := '') r
      ORDER BY t.distance --important for resetting the rownum variable) x
 WHERE x.rank <= 2
ORDER BY x.distance, x.a
Venule answered 21/5, 2010 at 22:51 Comment(5)
It's often an education reading your answers. How are the rows selected random?Dynamiter
@mdma: The rows are randomly ranked because they're only ordered by the distance value - any a & b value pair can be ranked as 1, etc. The ORDER BY t.distance is just to keep the distance values grouped, so the CASE statement correctly resets the rownum value back to start at 1 when a new distance value is encountered.Venule
Thanks for clarification. Doesn't that mean that the rows selected are arbitrary, rather than random. (I.e. they could be the same each time the query is run?)Dynamiter
@mdma: Ordering by the distance column only does not ensure that the same a & b values will always be returned at the same position. And being random doesn't ensure that different values will always be returned.Venule
@OMGPonies: shouldn't the @distance = t.distance be a @distance := t.distance instead?Dereism
K
0

One way would be to use union. Like so:

(SELECT a, b, distance FROM table WHERE distance = 250 LIMIT 2 ORDER BY RAND())
UNION
(SELECT a, b, distance FROM table WHERE distance = 500 LIMIT 2 ORDER BY RAND())
...
ORDER BY distance

I can think of a way of getting one of each with one query using distinct =/, but like I said that would only bring you one.

Kalahari answered 21/5, 2010 at 22:53 Comment(1)
Using UNION ALL would be better, seeing that there wouldn't be duplicates to be removed. Also, ORDER BY RAND() doesn't scale well: #1823806Venule
D
0

I wonder if this will work?

SELECT 
   a,b,distance 
FROM YourTable t2
   WHERE ROW(a,b,distance) IN 
   (
      SELECT a,b,distance FROM YourTable t1 
      WHERE t1.distance=t2.distance ORDER BY RAND() LIMIT 2
   )

EDIT: unfortunately not. LIMIT is not supported in a subquery.

Dynamiter answered 21/5, 2010 at 23:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.