MySQL JOIN the most recent row only?
Asked Answered
G

10

155

I have a table customer that stores a customer_id, email and reference. There is an additional table customer_data that stores a historical record of the changes made to the customer, i.e. when there's a change made a new row is inserted.

In order to display the customer information in a table, the two tables need to be joined, however only the most recent row from customer_data should be joined to the customer table.

It gets a little more complicated in that the query is paginated, so has a limit and an offset.

How can I do this with MySQL? I think I'm wanting to put a DISTINCT in there somewhere...

The query at the minute is like this-

SELECT *, CONCAT(title,' ',forename,' ',surname) AS name
FROM customer c
INNER JOIN customer_data d on c.customer_id=d.customer_id
WHERE name LIKE '%Smith%' LIMIT 10, 20

Additionaly, am I right in thinking I can use CONCAT with LIKE in this way?

(I appreciate that INNER JOIN might be the wrong type of JOIN to use. I actually have no clue what the difference is between the different JOINs. I'm going to look into that now!)

Grievous answered 1/9, 2010 at 14:19 Comment(5)
How does the customer history table look like? How is the most recent row determined? Is there a timestamp field?Gymnasiast
Most recent is simply the last row inserted - so its primary key is the highest number.Grievous
Why not a trigger? take a look at this answer: #26661814Wordsmith
Most/all of the answers were taking too long with millions of rows. There are some solutions with a better performance.Delate
Does this answer your question? Fetch the row which has the Max value for a columnSynclastic
G
193

You may want to try the following:

SELECT    CONCAT(title, ' ', forename, ' ', surname) AS name
FROM      customer c
JOIN      (
              SELECT    MAX(id) max_id, customer_id 
              FROM      customer_data 
              GROUP BY  customer_id
          ) c_max ON (c_max.customer_id = c.customer_id)
JOIN      customer_data cd ON (cd.id = c_max.max_id)
WHERE     CONCAT(title, ' ', forename, ' ', surname) LIKE '%Smith%' 
LIMIT     10, 20;

Note that a JOIN is just a synonym for INNER JOIN.

Test case:

CREATE TABLE customer (customer_id int);
CREATE TABLE customer_data (
   id int, 
   customer_id int, 
   title varchar(10),
   forename varchar(10),
   surname varchar(10)
);

INSERT INTO customer VALUES (1);
INSERT INTO customer VALUES (2);
INSERT INTO customer VALUES (3);

INSERT INTO customer_data VALUES (1, 1, 'Mr', 'Bobby', 'Smith');
INSERT INTO customer_data VALUES (2, 1, 'Mr', 'Bob', 'Smith');
INSERT INTO customer_data VALUES (3, 2, 'Mr', 'Jane', 'Green');
INSERT INTO customer_data VALUES (4, 2, 'Miss', 'Jane', 'Green');
INSERT INTO customer_data VALUES (5, 3, 'Dr', 'Jack', 'Black');

Result (query without the LIMIT and WHERE):

SELECT    CONCAT(title, ' ', forename, ' ', surname) AS name
FROM      customer c
JOIN      (
              SELECT    MAX(id) max_id, customer_id 
              FROM      customer_data 
              GROUP BY  customer_id
          ) c_max ON (c_max.customer_id = c.customer_id)
JOIN      customer_data cd ON (cd.id = c_max.max_id);

+-----------------+
| name            |
+-----------------+
| Mr Bob Smith    |
| Miss Jane Green |
| Dr Jack Black   |
+-----------------+
3 rows in set (0.00 sec)
Gymnasiast answered 1/9, 2010 at 14:37 Comment(3)
In a long run this approach might create performance problems as it would need to create a temporary table. So another solution (if possible) is to add a new boolean field (is_last) in customer_data which you would have to update every time a new entry is added. The last entry will have is_last=1, all other for this customer - is_last=0.Amphimacer
People should (please) also read the following answer (from Danny Coulombe), because this answer (sorry Daniel) is terribly slow with longer queries / more data. Made my page "wait" for 12 seconds to load; So please also check https://mcmap.net/q/151275/-mysql-join-the-most-recent-row-only . I didn't notice it till after a lot of other changes so it took me very long to find out.Feints
This solution is cute but totally slow. Execution plan shows that joined "Group by" makes full table scan with creating temp table. I've encountered prod issue with such approach where query for 25 ids eat 15GB memory of RDS Server and stuck the app completely.Jabez
S
205

If you are working with heavy queries, you better move the request for the latest row in the where clause. It is a lot faster and looks cleaner.

SELECT c.*,
FROM client AS c
LEFT JOIN client_calling_history AS cch ON cch.client_id = c.client_id
WHERE
   cch.cchid = (
      SELECT MAX(cchid)
      FROM client_calling_history
      WHERE client_id = c.client_id AND cal_event_id = c.cal_event_id
   )
Serosa answered 13/3, 2016 at 2:8 Comment(13)
Wow I'm almost in disbelief at how much of a performance difference this is. Not sure why that was so drastic yet, but so far it was so much quicker it that it feels like I messed up somewhere else...Edom
I really wish I could +1 this more than once so it gets seen more. I've tested this quite a bit and somehow it makes my queries virtually instantaneous (WorkBench literally says 0.000 seconds, even with sql_no_cache set), whereas doing the search in the join took multiple seconds to complete. Still baffled, but I mean you can't argue with results like that.Edom
I am not sure either why it is faster, but I heard that MySQL always starts from the bottom. So probably that the "SELECT MAX" query simply continue where it was and only run a few lines because of the ID index instead of being repeated every time when placed in a JOIN.Serosa
You are directly joining 2 tables first and then filtering with WHERE. I think it is a massive performance issue if you have a million client and tens of millions of calling history. Because SQL will try to join 2 tables first and then filter down to the single client. I would rather filter the clients and related calling histories from the tables first in a sub-query and then join the tables.Coursing
@Coursing you may be right, but if you try and look at the results, this implementation is much faster. Moreover, this query is the result of an optimization of a 1.5M clients VS 8M calling history jointure that has been proven to be very efficient over more than a year of intensive usage.Serosa
I suppose the "ca.client_id" and "ca.cal_event_id" must be "c" for both.Unclear
If the client_calling_history table doesn't have a record associated with the client table, then this won't return a row for the client.Demodulation
@NickCoons, isn't it why we use LEFT JOIN? But maybe I am misunderstanding your point?Serosa
I agree with @NickCoons. NULL values won't be returned because they're excluded by the where clause. How would you go about to include the NULL values and still keep the excellent performance of this query?Marianelamariani
I'm also interested in returning the NULL values for this queryAllhallowmas
You might actually get even better results by moving the subselect for the JOIN from the where clause into the ON clause. "Explain" on my queries based on this one says MySQL manages to optimize away the subselect when retrieving only one record. @Marianelamariani this should solve the problem with left outer join also.Splanchnic
I think you can get the NULL records like this <code> SELECT c.*, FROM client AS c LEFT JOIN client_calling_history AS cch ON cch.client_id = c.client_id WHERE ( SELECT MAX(cchid) FROM client_calling_history WHERE client_id = c.client_id AND cal_event_id = c.cal_event_id ) IS NULL OR cch.cchid = ( SELECT MAX(cchid) FROM client_calling_history WHERE client_id = c.client_id AND cal_event_id = c.cal_event_id ) </code>Rubstone
This option is far more performant in comparing to joining of group by. Note - you have to create index by field(s) by which you do "where" in subquery, otherwise it becomes even heavier than option with group by.Jabez
G
193

You may want to try the following:

SELECT    CONCAT(title, ' ', forename, ' ', surname) AS name
FROM      customer c
JOIN      (
              SELECT    MAX(id) max_id, customer_id 
              FROM      customer_data 
              GROUP BY  customer_id
          ) c_max ON (c_max.customer_id = c.customer_id)
JOIN      customer_data cd ON (cd.id = c_max.max_id)
WHERE     CONCAT(title, ' ', forename, ' ', surname) LIKE '%Smith%' 
LIMIT     10, 20;

Note that a JOIN is just a synonym for INNER JOIN.

Test case:

CREATE TABLE customer (customer_id int);
CREATE TABLE customer_data (
   id int, 
   customer_id int, 
   title varchar(10),
   forename varchar(10),
   surname varchar(10)
);

INSERT INTO customer VALUES (1);
INSERT INTO customer VALUES (2);
INSERT INTO customer VALUES (3);

INSERT INTO customer_data VALUES (1, 1, 'Mr', 'Bobby', 'Smith');
INSERT INTO customer_data VALUES (2, 1, 'Mr', 'Bob', 'Smith');
INSERT INTO customer_data VALUES (3, 2, 'Mr', 'Jane', 'Green');
INSERT INTO customer_data VALUES (4, 2, 'Miss', 'Jane', 'Green');
INSERT INTO customer_data VALUES (5, 3, 'Dr', 'Jack', 'Black');

Result (query without the LIMIT and WHERE):

SELECT    CONCAT(title, ' ', forename, ' ', surname) AS name
FROM      customer c
JOIN      (
              SELECT    MAX(id) max_id, customer_id 
              FROM      customer_data 
              GROUP BY  customer_id
          ) c_max ON (c_max.customer_id = c.customer_id)
JOIN      customer_data cd ON (cd.id = c_max.max_id);

+-----------------+
| name            |
+-----------------+
| Mr Bob Smith    |
| Miss Jane Green |
| Dr Jack Black   |
+-----------------+
3 rows in set (0.00 sec)
Gymnasiast answered 1/9, 2010 at 14:37 Comment(3)
In a long run this approach might create performance problems as it would need to create a temporary table. So another solution (if possible) is to add a new boolean field (is_last) in customer_data which you would have to update every time a new entry is added. The last entry will have is_last=1, all other for this customer - is_last=0.Amphimacer
People should (please) also read the following answer (from Danny Coulombe), because this answer (sorry Daniel) is terribly slow with longer queries / more data. Made my page "wait" for 12 seconds to load; So please also check https://mcmap.net/q/151275/-mysql-join-the-most-recent-row-only . I didn't notice it till after a lot of other changes so it took me very long to find out.Feints
This solution is cute but totally slow. Execution plan shows that joined "Group by" makes full table scan with creating temp table. I've encountered prod issue with such approach where query for 25 ids eat 15GB memory of RDS Server and stuck the app completely.Jabez
A
12

Presuming the autoincrement column in customer_data is named Id, you can do:

SELECT CONCAT(title,' ',forename,' ',surname) AS name *
FROM customer c
    INNER JOIN customer_data d 
        ON c.customer_id=d.customer_id
WHERE name LIKE '%Smith%'
    AND d.ID = (
                Select Max(D2.Id)
                From customer_data As D2
                Where D2.customer_id = D.customer_id
                )
LIMIT 10, 20
Atticism answered 1/9, 2010 at 14:38 Comment(0)
M
9

For anyone who must work with an older version of MySQL (pre-5.0 ish) you are unable to do sub-queries for this type of query. Here is the solution I was able to do and it seemed to work great.

SELECT MAX(d.id), d2.*, CONCAT(title,' ',forename,' ',surname) AS name
FROM customer AS c 
LEFT JOIN customer_data as d ON c.customer_id=d.customer_id 
LEFT JOIN customer_data as d2 ON d.id=d2.id
WHERE CONCAT(title, ' ', forename, ' ', surname) LIKE '%Smith%'
GROUP BY c.customer_id LIMIT 10, 20;

Essentially this is finding the max id of your data table joining it to the customer then joining the data table to the max id found. The reason for this is because selecting the max of a group doesn't guarantee that the rest of the data matches with the id unless you join it back onto itself.

I haven't tested this on newer versions of MySQL but it works on 4.0.30.

Monah answered 9/1, 2014 at 0:11 Comment(4)
This is exquisite in its simplicity. Why is this the first time I have ever seen this approach? Note that EXPLAIN indicates that this uses a temporary table and filesort. Adding ORDER BY NULL at the end weeds out the filesort.Guerdon
To my regret, my own, not-as-beautiful solution is 3.5 times as fast for my data. I used a subquery to select the main table plus the most recent IDs of the joined tables, and then an outer query that selects the subquery and reads actual data from the joined tables. I am joining 5 tables onto the main table, and testing with a where condition that selects 1000 records. Indexes are optimal.Guerdon
I was using your solution with SELECT *, MAX(firstData.id), MAX(secondData.id) [...]. Logically, by changing to SELECT main.*, firstData2.*, secondData2.*, MAX(firstData.id), MAX(secondData.id), [...] I was able to make it significantly faster. This allows the first joins to read only from the index, rather than also having to read all data from the primary index. Now the pretty solution takes only 1.9 times as long as the subquery-based solution.Guerdon
It does not work anymore in MySQL 5.7. Now d2.* will return data for the first row in the group, not the last. SELECT MAX(R1.id), R2.* FROM invoices I LEFT JOIN responses R1 ON I.id=R1.invoice_id LEFT JOIN responses R2 ON R1.id=R2.id GROUP BY I.id LIMIT 0,10Ovenware
I
7

I know this question is old, but it's got a lot of attention over the years and I think it's missing a concept which may help someone in a similar case. I'm adding it here for completeness sake.

If you cannot modify your original database schema, then a lot of good answers have been provided and solve the problem just fine.

If you can, however, modify your schema, I would advise to add a field in your customer table that holds the id of the latest customer_data record for this customer:

CREATE TABLE customer (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  current_data_id INT UNSIGNED NULL DEFAULT NULL
);

CREATE TABLE customer_data (
   id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
   customer_id INT UNSIGNED NOT NULL, 
   title VARCHAR(10) NOT NULL,
   forename VARCHAR(10) NOT NULL,
   surname VARCHAR(10) NOT NULL
);

Querying customers

Querying is as easy and fast as it can be:

SELECT c.*, d.title, d.forename, d.surname
FROM customer c
INNER JOIN customer_data d on d.id = c.current_data_id
WHERE ...;

The drawback is the extra complexity when creating or updating a customer.

Updating a customer

Whenever you want to update a customer, you insert a new record in the customer_data table, and update the customer record.

INSERT INTO customer_data (customer_id, title, forename, surname) VALUES(2, 'Mr', 'John', 'Smith');
UPDATE customer SET current_data_id = LAST_INSERT_ID() WHERE id = 2;

Creating a customer

Creating a customer is just a matter of inserting the customer entry, then running the same statements:

INSERT INTO customer () VALUES ();

SET @customer_id = LAST_INSERT_ID();
INSERT INTO customer_data (customer_id, title, forename, surname) VALUES(@customer_id, 'Mr', 'John', 'Smith');
UPDATE customer SET current_data_id = LAST_INSERT_ID() WHERE id = @customer_id;

Wrapping up

The extra complexity for creating/updating a customer might be fearsome, but it can easily be automated with triggers.

Finally, if you're using an ORM, this can be really easy to manage. The ORM can take care of inserting the values, updating the ids, and joining the two tables automatically for you.

Here is how your mutable Customer model would look like:

class Customer
{
    private int id;
    private CustomerData currentData;

    public Customer(String title, String forename, String surname)
    {
        this.update(title, forename, surname);
    }

    public void update(String title, String forename, String surname)
    {
        this.currentData = new CustomerData(this, title, forename, surname);
    }

    public String getTitle()
    {
        return this.currentData.getTitle();
    }

    public String getForename()
    {
        return this.currentData.getForename();
    }

    public String getSurname()
    {
        return this.currentData.getSurname();
    }
}

And your immutable CustomerData model, that contains only getters:

class CustomerData
{
    private int id;
    private Customer customer;
    private String title;
    private String forename;
    private String surname;

    public CustomerData(Customer customer, String title, String forename, String surname)
    {
        this.customer = customer;
        this.title    = title;
        this.forename = forename;
        this.surname  = surname;
    }

    public String getTitle()
    {
        return this.title;
    }

    public String getForename()
    {
        return this.forename;
    }

    public String getSurname()
    {
        return this.surname;
    }
}
Isadoraisadore answered 11/8, 2018 at 14:23 Comment(1)
I combined this approach with @payne8's solution (above) to get my desired result without any subqueries.Dhaulagiri
P
2
SELECT CONCAT(title,' ',forename,' ',surname) AS name * FROM customer c 
INNER JOIN customer_data d on c.id=d.customer_id WHERE name LIKE '%Smith%' 

i think you need to change c.customer_id to c.id

else update table structure

Puzzler answered 1/9, 2010 at 14:23 Comment(0)
T
1

You can also do this

SELECT    CONCAT(title, ' ', forename, ' ', surname) AS name
FROM      customer c
LEFT JOIN  (
              SELECT * FROM  customer_data ORDER BY id DESC
          ) customer_data ON (customer_data.customer_id = c.customer_id)
GROUP BY  c.customer_id          
WHERE     CONCAT(title, ' ', forename, ' ', surname) LIKE '%Smith%' 
LIMIT     10, 20;
Tiaratibbetts answered 8/12, 2017 at 7:18 Comment(0)
P
1

This question is old now, but it still seems to get views and recent releases of MySQL (8.0 and later) have a much better alternative using windowing functions:

SELECT *
FROM 
(
    SELECT *, CONCAT(title,' ',forename,' ',surname) AS name,
        row_number() over (PARTITION BY d.customer_id ORDER BY d.id) rn
    FROM customer c
    INNER JOIN customer_data d on c.customer_id=d.customer_id
    WHERE name LIKE '%Smith%' 
)
WHERE rn = 1
LIMIT 10, 20
Petit answered 13/4, 2023 at 15:32 Comment(2)
I also added the mysql-5.7 tag to the question, since others who find this question via Google are unlikely to read past the first few high-voted answers to find this newer/better technique. The tag makes it clearer those were for older MySQL releases.Petit
also in MariaDB 10.1 and up.Culpepper
B
0

It's a good idea that logging actual data into "customer_data" table. With this data you can select all data from "customer_data" table as you wish.

Blakeney answered 1/9, 2010 at 14:33 Comment(1)
this answer is irrelevent to the quetsion.Etheridge
R
-1

Easy solution to left join the 1 most/least recent row is using select over ON phrase

SELECT *
FROM A
LEFT JOIN B
ON A.id = (SELECT MAX(id) FROM B WHERE id = A.id)

Where A.id is the auto-incremental primary key.

Rinderpest answered 28/12, 2021 at 10:57 Comment(2)
MAX(id)? Why do you want to join the max primary id of table B to the primary id of table A? There is no use of the foreign key in your join query at all.Panegyrize
The requested question is seeking for "the most recent row only" and considering id as the primary key, getting table B using MAX(id) has two advantages, one: it returns just one row And Two: it returns most recent one! that's I've used in most cases and works just fine.Rinderpest

© 2022 - 2024 — McMap. All rights reserved.