How do I select a fixed number of rows for each group?

Asked 21/5, 2010 at 22:40 Answered 21/5, 2010 at 23:8

Here is some example data in a mysql table

a   b   distance
15  44  250
94  31  250
30  41  250
6   1   250
95  18  250
72  84  500
14  23  500
55  24  500
95  8   500
59  25  500
40  73  500
65  85  500
32  50  500
31  39  500
22  25  500
37  11  750
98  39  750
15  57  750
9   22  750
14  44  750
69  22  750
62  50  750
89  35  750
67  65  750
74  37  750
52  36  750
66  53  750
82  74  1000
79  22  1000
98  41  1000

How do I query this table such that I get 2 rows per distance selected at random?

A successful query will produce something like

   a    b   distance
    30  41  250
    95  18  250
    59  25  500
    65  85  500
    15  57  750
    89  35  750
    79  22  1000
    98  41  1000

Verenaverene answered 21/5, 2010 at 22:40 Comment(0)

Use:

SELECT x.a,
       x.b,
       x.distance
  FROM (SELECT t.a,
               t.b,
               t.distance
               CASE 
                 WHEN @distance != t.distance THEN @rownum := 1 
                 ELSE @rownum := @rownum + 1 
               END AS rank,
               @distance := t.distance
          FROM TABLE t
          JOIN (SELECT @rownum := 0, @distance := '') r
      ORDER BY t.distance --important for resetting the rownum variable) x
 WHERE x.rank <= 2
ORDER BY x.distance, x.a

Venule answered 21/5, 2010 at 22:51 Comment(5)

It's often an education reading your answers. How are the rows selected random? – Dynamiter 21/5, 2010 at 23:13

@mdma: The rows are randomly ranked because they're only ordered by the distance value - any a & b value pair can be ranked as 1, etc. The ORDER BY t.distance is just to keep the distance values grouped, so the CASE statement correctly resets the rownum value back to start at 1 when a new distance value is encountered. – Venule 21/5, 2010 at 23:20

Thanks for clarification. Doesn't that mean that the rows selected are arbitrary, rather than random. (I.e. they could be the same each time the query is run?) – Dynamiter 21/5, 2010 at 23:33

@mdma: Ordering by the distance column only does not ensure that the same a & b values will always be returned at the same position. And being random doesn't ensure that different values will always be returned. – Venule 21/5, 2010 at 23:51

@OMGPonies: shouldn't the @distance = t.distance be a @distance := t.distance instead? – Dereism 4/12, 2012 at 16:19

One way would be to use union. Like so:

(SELECT a, b, distance FROM table WHERE distance = 250 LIMIT 2 ORDER BY RAND())
UNION
(SELECT a, b, distance FROM table WHERE distance = 500 LIMIT 2 ORDER BY RAND())
...
ORDER BY distance

I can think of a way of getting one of each with one query using distinct =/, but like I said that would only bring you one.

Kalahari answered 21/5, 2010 at 22:53 Comment(1)

Using UNION ALL would be better, seeing that there wouldn't be duplicates to be removed. Also, ORDER BY RAND() doesn't scale well: #1823806 – Venule 21/5, 2010 at 23:0

I wonder if this will work?

SELECT 
   a,b,distance 
FROM YourTable t2
   WHERE ROW(a,b,distance) IN 
   (
      SELECT a,b,distance FROM YourTable t1 
      WHERE t1.distance=t2.distance ORDER BY RAND() LIMIT 2
   )

EDIT: unfortunately not. LIMIT is not supported in a subquery.

Dynamiter answered 21/5, 2010 at 23:8 Comment(0)

Recommended topics

Hot tags