How to select the first row for each group in MySQL?
Asked Answered
B

16

107

In C# it would be like this:

table
   .GroupBy(row => row.SomeColumn)
   .Select(group => group
       .OrderBy(row => row.AnotherColumn)
       .First()
   )

Linq-To-Sql translates it to the following T-SQL code:

SELECT [t3].[AnotherColumn], [t3].[SomeColumn]
FROM (
    SELECT [t0].[SomeColumn]
    FROM [Table] AS [t0]
    GROUP BY [t0].[SomeColumn]
    ) AS [t1]
OUTER APPLY (
    SELECT TOP (1) [t2].[AnotherColumn], [t2].[SomeColumn]
    FROM [Table] AS [t2]
    WHERE (([t1].[SomeColumn] IS NULL) AND ([t2].[SomeColumn] IS NULL))
      OR (([t1].[SomeColumn] IS NOT NULL) AND ([t2].[SomeColumn] IS NOT NULL)
        AND ([t1].[SomeColumn] = [t2].[SomeColumn]))
    ORDER BY [t2].[AnotherColumn]
    ) AS [t3]
ORDER BY [t3].[AnotherColumn]

But it is incompatible with MySQL.

Bobsledding answered 29/4, 2010 at 17:36 Comment(2)
can't you monitor the DB server to see what queries C# executes (I'm some-what guessing that your syntax above is LINQ)Burchfield
@Iexu Yes I can, and I did it with MS SQL Server. But I don't have any Linq-to-MySQL, only Linq-To-SqlBobsledding
R
5

rtribaldos mentioned that in younger database versions, window-functions could be used.
Here is a code which worked for me and was as fast as Martin Zwarík's substring_index-solution (in Mariadb 10.5.16):

SELECT group_col, order_col FROM (
  SELECT group_col, order_col
  , ROW_NUMBER() OVER(PARTITION BY group_col ORDER BY order_col) rnr 
  FROM some_table
  WHERE <some_condition>
) i
WHERE rnr=1;
Rosenkranz answered 16/9, 2022 at 12:36 Comment(0)
M
121

I based my answer on the title of your post only, as I don't know C# and didn't understand the given query. But in MySQL I suggest you try subselects. First get a set of primary keys of interesting columns then select data from those rows:

SELECT somecolumn, anothercolumn 
  FROM sometable 
 WHERE id IN (
               SELECT min(id) 
                 FROM sometable 
                GROUP BY somecolumn
             );
Millard answered 29/4, 2010 at 17:45 Comment(4)
I think it will work for me, but this solution requires that I create a PK id for my table.Bobsledding
While the C#/T-SQL solution doesn't requires it.Bobsledding
Well, it's a good practice to always have a primary key, and theorically if you don't have a primary key, the set of whole row should be your primary key (although MySQL would accept a table with no primary key with repeated rows).Millard
IN tends to be very slow if you are evaluating a large recordset. You'll often get better performance from EXISTS if you can use it. In many cases (for example this one) you could use an INNER JOIN which is faster still. SELECT c1, c2 FROM t1 INNER JOIN (SELECT min(c2) c2 FROM t1) a1 ON t1.c2=a1.c2Acred
B
30

Here's another way you could try, that doesn't need that ID field.

select some_column, min(another_column)
  from i_have_a_table
 group by some_column

Still I agree with lfagundes that you should add some primary key ..

Also beware that by doing this, you cannot (easily) get at the other values is the same row as the resulting some_colum, another_column pair! You'd need lfagundes apprach and a PK to do that!

Burchfield answered 29/4, 2010 at 18:27 Comment(2)
this make more sense at all!Celestinecelestite
That's the perfect solution for me.Uchish
B
24

When I write

SELECT AnotherColumn
FROM Table
GROUP BY SomeColumn
;

It works. IIRC in other RDBMS such statement is impossible, because a column that doesn't belongs to the grouping key is being referenced without any sort of aggregation.

This "quirk" behaves very closely to what I want. So I used it to get the result I wanted:

SELECT * FROM 
(
 SELECT * FROM `table`
 ORDER BY AnotherColumn
) t1
GROUP BY SomeColumn
;
Bobsledding answered 29/4, 2010 at 18:35 Comment(4)
In a similar case, the selection part works for me but when I am trying to do an update on result obtained with this query in mysql it doesn't work. I have tried many solutions so far for "update" with no success. Would appreciate any help/suggestion there.Orthogenetic
Discussion about why the first statement works: #1225644 .Apparently starting MySQL 5.7.5 this will be disabled by default, dev.mysql.com/doc/refman/5.7/en/…Tweedy
It is not considering order in Mysql like this, and the record which is getting taken in group by is a random or first oneMarylouisemaryly
This seemed tp stopped working on MySQL version 8.x.xLoomis
B
21

Best performance and easy to use:

SELECT id, code,
SUBSTRING_INDEX(
     GROUP_CONCAT(price ORDER BY id DESC SEPARATOR '|'), '|', 1
) first_found_price
FROM stocks
GROUP BY code
ORDER BY id DESC
Boltrope answered 3/3, 2020 at 14:48 Comment(1)
It is an interesting solution. thanks.Selffulfillment
T
8
SELECT
    t1.*

FROM
    table_name AS t1

    LEFT JOIN table_name AS t2 ON (
        t2.group_by_column = t1.group_by_column
        -- group_by_column is the column you would use in the GROUP BY statement
        AND
        t2.order_by_column < t1.order_by_column
        -- order_by_column is column you would use in the ORDER BY statement
        -- usually is the autoincremented key column
    )

WHERE
    t2.group_by_column IS NULL;

With MySQL v8+ you could use window functions

Trimetallic answered 1/11, 2018 at 13:56 Comment(2)
This is the only answer that I could get to work for my use-case in 5.7+ with ONLY_FULL_GROUP_BY enabled. We have a PK in place and, for whatever reason, MySQL 5.7 kept thinking it was not functionally-dependent on the column we needed to GROUP BY. The other answers seem very specific to their particular issue or require SQL variables... This is a straight query and generic enough for many purposes. The only thing I had to change was the inequality for the ORDER BY columns, but that's to be expected depending on needs.Lankester
Awesome! Simple and effective, it worked for me. I also added a GROUP BY t1.group_by_column to always return only 1 row. It helped me in cases where t1.order_by_column was equal to t2.order_by_column. Thank you.Hubey
M
7

I suggest to use this official way from MySql:

SELECT article, dealer, price
FROM   shop s1
WHERE  price=(SELECT MAX(s2.price)
              FROM shop s2
              WHERE s1.article = s2.article
              GROUP BY s2.article)
ORDER BY article;

With this way, we can get the highest price on each article

Muzzy answered 17/7, 2020 at 3:36 Comment(1)
This is a very clean way to do this. Many thanks.Edwardedwardian
B
6

You should use some aggregate function to get the value of AnotherColumn that you want. That is, if you want the lowest value of AnotherColumn for each value of SomeColumn (either numerically or lexicographically), you can use:

SELECT SomeColumn, MIN(AnotherColumn)
FROM YourTable
GROUP BY SomeColumn

Some hopefully helpful links:

http://dev.mysql.com/doc/refman/5.1/en/group-by-functions.html

http://www.oreillynet.com/databases/blog/2007/05/debunking_group_by_myths.html

Brushwork answered 29/4, 2010 at 18:27 Comment(4)
When I do that, the SomeColumn value isn't necessarily the value in the row where AnotherColumn = Min(AnotherColumn)Bobsledding
@Jader Dias: As I stated in my answer, that's why you would need a PK!Burchfield
Min(AnotherColumn) in the grouping context is the lowest AnotherColumn for the group of rows with the same value of SomeColumn, not for all values of AnotherColumn for the whole table.Brushwork
The aggregate function to use is not MIN but FIRST, which MySQL lacks.Housemaster
D
6

From MySQL 5.7 documentation

MySQL 5.7.5 and up implements detection of functional dependence. If the ONLY_FULL_GROUP_BY SQL mode is enabled (which it is by default), MySQL rejects queries for which the select list, HAVING condition, or ORDER BY list refer to nonaggregated columns that are neither named in the GROUP BY clause nor are functionally dependent on them.

This means that @Jader Dias's solution wouldn't work everywhere.

Here is a solution that would work when ONLY_FULL_GROUP_BY is enabled:

SET @row := NULL;
SELECT
    SomeColumn,
    AnotherColumn
FROM (
    SELECT
        CASE @id <=> SomeColumn AND @row IS NOT NULL 
            WHEN TRUE THEN @row := @row+1 
            ELSE @row := 0 
        END AS rownum,
        @id := SomeColumn AS SomeColumn,
        AnotherColumn
    FROM
        SomeTable
    ORDER BY
        SomeColumn, -AnotherColumn DESC
) _values
WHERE rownum = 0
ORDER BY SomeColumn;
Diffuser answered 5/1, 2018 at 13:21 Comment(1)
Verified that this is a working solution. This is currently the only working solution I've seen for MySQL 5.7.5 with the default setting for ONLY_FULL_GROUP_BY.Denton
B
5

I have not seen the following solution among the answers, so I thought I'd put it out there.

The problem is to select rows which are the first rows when ordered by AnotherColumn in all groups grouped by SomeColumn.

The following solution will do this in MySQL. id has to be a unique column which must not hold values containing - (which I use as a separator).

select t1.*
from mytable t1
inner join (
  select SUBSTRING_INDEX(
    GROUP_CONCAT(t3.id ORDER BY t3.AnotherColumn DESC SEPARATOR '-'),
    '-', 
    1
  ) as id
  from mytable t3
  group by t3.SomeColumn
) t2 on t2.id = t1.id


-- Where 
SUBSTRING_INDEX(GROUP_CONCAT(id order by AnotherColumn desc separator '-'), '-', 1)
-- can be seen as:
FIRST(id order by AnotherColumn desc)

-- For completeness sake:
SUBSTRING_INDEX(GROUP_CONCAT(id order by AnotherColumn desc separator '-'), '-', -1)
-- would then be seen as:
LAST(id order by AnotherColumn desc)

There is a feature request for FIRST() and LAST() in the MySQL bug tracker, but it was closed many years back.

Breeden answered 19/4, 2018 at 8:32 Comment(0)
R
5

rtribaldos mentioned that in younger database versions, window-functions could be used.
Here is a code which worked for me and was as fast as Martin Zwarík's substring_index-solution (in Mariadb 10.5.16):

SELECT group_col, order_col FROM (
  SELECT group_col, order_col
  , ROW_NUMBER() OVER(PARTITION BY group_col ORDER BY order_col) rnr 
  FROM some_table
  WHERE <some_condition>
) i
WHERE rnr=1;
Rosenkranz answered 16/9, 2022 at 12:36 Comment(0)
F
2

How about this:

SELECT SUBSTRING_INDEX(
      MIN(CONCAT(OrderColumn, '|', IFNULL(TargetColumn, ''))
    ), '|', -1) as TargetColumn
FROM table
GROUP BY GroupColumn
Frazzle answered 12/10, 2017 at 8:56 Comment(0)
S
1

Yet another way to do it

Select max from group that works in views

SELECT * FROM action a 
WHERE NOT EXISTS (
   SELECT 1 FROM action a2 
   WHERE a2.user_id = a.user_id 
   AND a2.action_date > a.action_date 
   AND a2.action_type = a.action_type
)
AND a.action_type = "CF"
Stogner answered 3/2, 2016 at 18:51 Comment(0)
A
1

Yet another way to do it (without the primary key) would be using the JSON functions:

select somecolumn, json_unquote( json_extract(json_arrayagg(othercolumn), "$[0]") )
  from sometable group by somecolumn

or pre 5.7.22

select somecolumn, 
  json_unquote( 
    json_extract( 
      concat('["', group_concat(othercolumn separator '","') ,'"]') 
    ,"$[0]" ) 
  ) 
  from sometable group by somecolumn

Ordering (or filtering) can be done before grouping:

select somecolumn, json_unquote( json_extract(json_arrayagg(othercolumn), "$[0]") ) 
  from (select * from sometable order by othercolumn) as t group by somecolumn

... or after grouping (of course):

select somecolumn, json_unquote( json_extract(json_arrayagg(othercolumn), "$[0]") ) as other 
  from sometable group by somecolumn order by other

Admittedly, it's rather convoluted and performance is probably not great (didn't test it on large data, works well on my limited data sets).

Aquiver answered 18/7, 2018 at 9:44 Comment(0)
T
1

I recently discovered a cool trick to accomplish this. Basically just make two different subqueries from a table and join them together. One of the subqueries does the aggregation based on a grouping, and the other subquery just grabs the first DISTINCT row for each grouped item.

When you join these subqueries together, you will get the first distinct item from each group, but will also get the aggregated columns across the whole group for each item. This is essentially the same result as having ONLY_FULL_GROUP_BY turned off.

SELECT non_aggregated_data.foo_column AS foo_column,
       non_aggregated_data.bar_column AS bar_column,
       aggregated_data.value_1_sum    AS value_1_sum,
       aggregated_data.value_2_sum    AS value_2_sum
FROM (SELECT column_to_self_join_on,
             sum(value_1) AS value_1_sum,
             sum(value_2) AS value_2_sum
      FROM example_table
      GROUP BY column_to_self_join_on) AS aggregated_data
         LEFT JOIN (SELECT DISTINCT(column_to_self_join_on),
                                   foo_column,
                                   bar_column
                    FROM example_table) AS non_aggregated_data
                   ON non_aggregated_data.column_to_self_join_on = aggregated_data.column_to_self_join_on
Transudate answered 30/6, 2021 at 0:43 Comment(1)
I also like this model for dealing with very large databases because the query becomes easy to debug. You can separate the parts and run them separately and then test the JOIN. When using the EXPLAIN feature is a beauty so.Thurnau
A
0

Select the first row for each group (as ordered by a column) in Mysql .

We have:

a table: mytable
a column we are ordering by: the_column_to_order_by
a column that we wish to group by: the_group_by_column

Here's my solution. The inner query gets you a unique set of rows, selected as a dual key. The outer query joins the same table by joining on both of those keys (with AND).

SELECT * FROM 
    ( 
        SELECT the_group_by_column, MAX(the_column_to_order_by) the_column_to_order_by 
        FROM mytable 
        GROUP BY the_group_by_column 
        ORDER BY MAX(the_column_to_order_by) DESC 
    ) as mytable1 
JOIN mytable mytable2 ON mytable2.the_group_by_column = 
mytablealiamytable2.the_group_by_column 
  AND mytable2.the_column_to_order_by = mytable1.the_column_to_order_by;

FYI: I haven't thought about efficiency at all for this and can't speak to that one way or the other.

Ancillary answered 16/4, 2020 at 10:15 Comment(0)
F
-3

Why not use MySQL LIMIT keyword?

SELECT [t2].[AnotherColumn], [t2].[SomeColumn]
FROM [Table] AS [t2]
WHERE (([t1].[SomeColumn] IS NULL) AND ([t2].[SomeColumn] IS NULL))
  OR (([t1].[SomeColumn] IS NOT NULL) AND ([t2].[SomeColumn] IS NOT NULL)
    AND ([t1].[SomeColumn] = [t2].[SomeColumn]))
ORDER BY [t2].[AnotherColumn]
LIMIT 1
Firepower answered 8/9, 2017 at 8:52 Comment(1)
This returns the first row of the whole query, not the first row of each group. There should be a way to do this for each group, given how common this question is, but the SQL groups were too busy arguing over the meaning of NULL to bother with practical problems like this.Ab

© 2022 - 2024 — McMap. All rights reserved.