Converting SELECT DISTINCT ON queries from Postgresql to MySQL

R

5

51

I've been using PostgreSQL and now migrating to MySQL.

In my queries, I'm using PostgreSQL's SELECT DISTINCT ON (col1, col2, col3), I was wondering if there is any counterpart of this statement in MySQL.

Relativity answered 16/7, 2013 at 9:56 Comment(2)

You can try Select DISTINCT concat(col1,col2,col3) from table if you need the combination of the three to be distinct – Vidovik 16/7, 2013 at 10:4

I guess in MySQL you need to use a "partial" group by on those three columns (which would not be allowed in any other DBMS) and live with the fact that you get unpredictable values for the non-distinct columns. – Cadet 16/7, 2013 at 10:21

C

68

There's not an exact equivalent to convert a Postgresql query that makes use of SELECT DISTINCT ON to MySQL.

Postgresql SELECT DISTINCT ON

In Postgresql, the following query will eliminate all rows where the expressions (col1, col2, col3) match, and it will only keep the "first col4, col5 row" for each set of matched rows:

SELECT DISTINCT ON (col1, col2, col3) col4, col5
FROM tablename

So if your table is like this:

col1 | col2 | col3 | col4 | col5
--------------------------------
1    | 2    | 3    | 777  | 888
1    | 2    | 3    | 888  | 999
3    | 3    | 3    | 555  | 555

our query will keep just one row for (1,2,3) and one row for (3,3,3). The resulting rows will then be:

col4 | col5
-----------
777  | 888
555  | 555

please notice that the "first row" of each set is unpredictable, our fist row might be (888, 999) as well unless we specify an ORDER BY:

SELECT DISTINCT ON (col1, col2, col3) col4, col5
FROM tablename
ORDER BY col1, col2, col3, col4

(the DISTINCT on expressions must match the leftmost ORDER BY expressions, but the ORDER BY can contain additional expressions).

MySQL extension to GROUP BY

MySQL extends the use of GROUP BY so that we can select nonaggregated columns not named in the GROUP BY clause. Whenever we select nonaggregated columns the server is free to choose any value from each group from that column, so the resulting values will be indetermined.

So this Postgresql query:

SELECT DISTINCT ON (col1, col2, col3) col4, col5
FROM tablename

can be considered equivalent to this MySQL query:

SELECT col4, col5
FROM tablename
GROUP BY col1, col2, col3

both Postgresql and MySQL will return the "First row" for each (col1, col2, col3), and in both cases the row returned is unpredictable because we didn't specify and order by clause.

A lot of people would be very tempted to convert this Postgresql query with an ORDER BY:

SELECT DISTINCT ON (col1, col2, col3) col4, col5
FROM tablename
ORDER BY col1, col2, col3, col4

with this one:

SELECT col4, col5
FROM (
  SELECT col1, col2, col3, col4, col5
  FROM tablename
  ORDER BY col1, col2, col3, col4
) s
GROUP BY col1, col2, col3

the idea here is to apply an ORDER BY to a subquery so that when MySQL groups by col1, col2, col3 it will keep the first encountered value for col4 and col5. The idea is good, but it's wrong! MySQL is free to choose any value for col4 and col5, and we don't know which are the first values encountered, it depends on the optimizer. So I would correct it to this:

SELECT t1.col4, t1.col5
FROM tablename t1 INNER JOIN (SELECT col1, col2, col3, MIN(col4) as m_col4
                              FROM tablename
                              GROUP BY col1, col2, col3) s
     ON t1.col1=s.col1
        AND t1.col2=s.col2
        AND t1.col3=s.col3
        AND t1.col4=s.m_col4
GROUP BY
  t1.col1, t1.col2, t1.col3, t1.col4

but this is starting to get more complicated.

Conclusion

As a general rule, there's not an exact way to convert a Postgresql query to a MySQL query, but there are a lot of workarounds, the resulting query might be as simple as the original one or it might become very complicated, but it depends on the query itself.

Creation answered 23/11, 2014 at 16:28 Comment(2)

Also because more modern MySQL/MariaDB servers are going to follow the SQL standard more most likely the ORDER BY in the subquery will also can/will be ignored without changing the subqeury resultset with LIMIT meaning the results can change upgrading from a older MySQL version to a more modern one. – Tropous 17/12, 2018 at 16:5

"As a general rule, there's not an exact way to convert a Postgresql query to a MySQL query, but there are a lot of workarounds" Indeed i find the MySQL's SUBSTRING_INDEX(GROUP_CONCAT(.... ORDER BY ....), ..) method the most easy or best native method to implemented DISTINCT ON (col1, col2, col3), * syntax ... Also because of the other comment where MariaDB/MySQL could ignore ORDER BY in subqueries. – Tropous 17/12, 2018 at 16:21

R

2

Use a subquery to determine the order, and an outer query to group them.

Like @a_horse_with_no_name points out, this works because MySQL allows partial group by, unlike other DBMSs.

For example:

CREATE TABLE customer_order
    (`customer` varchar(5), `item` varchar(6), `date` datetime)
;

INSERT INTO customer_order
    (`customer`, `item`, `date`)
VALUES
    ('alice', 'widget', '2000-01-05 00:00:00'),
    ('bob', 'widget', '2000-01-02 00:00:00'),
    ('alice', 'widget', '2000-01-01 00:00:00'),
    ('alice', 'wodget', '2000-01-06 00:00:00')
;

Query for each customer's first order:

select *
from
  (select customer, item, date
  from customer_order
  order by date) c
group by customer

Result:

| CUSTOMER |   ITEM |                           DATE |
|----------|--------|--------------------------------|
|    alice | widget | January, 01 2000 00:00:00+0000 |
|      bob | widget | January, 02 2000 00:00:00+0000 |

http://sqlfiddle.com/#!2/6cbbe/1

Reset answered 17/9, 2014 at 17:2 Comment(2)

Also because more modern MySQL/MariaDB servers are going to follow the SQL standard more most likely the ORDER BY in the subquery will also can/will be ignored without changing the subqeury resultset with LIMIT meaning the results can change upgrading from a older MySQL version to a more modern one. – Tropous 17/12, 2018 at 16:7

If the above query is not working, try this: SELECT customer, min(date) AS max_date FROM customer_order GROUP BY customer; – Whipcord 28/4, 2020 at 7:14

H

1

if with DISTINCT ON you want to join last (or some specific) record from one-to-many relation, you can do:

SELECT * FROM customer c LEFT JOIN purchase p 
ON (SELECT purchaseID FROM purchase WHERE customerID = c.customerID ORDER BY Purchase_Date DESC LIMIT 1) = purchaseID

Hyacinthhyacintha answered 15/12, 2023 at 15:28 Comment(0)

H

-6

you should migrate to PDO or MSYQLI instead of MYSQL as its already deprecated.

about your question you can do

   SELECT DISTINCT col1, col2, col3

or

    SELECT col1, col2, col3
    ........

    GROUP BY col1 --//--- or whatever column you want to be distinct

Hopefully answered 16/7, 2013 at 10:11 Comment(0)

I

-6

You can't select distinct values from multiple columns. while selecting use query like this

select distinct col1, col2 from table

Ingres answered 4/1, 2014 at 5:55 Comment(0)

Recommended topics

Hot tags