SQL query: Delete all records from the table except latest N?
Asked Answered
R

17

103

Is it possible to build a single mysql query (without variables) to remove all records from the table, except latest N (sorted by id desc)?

Something like this, only it doesn't work :)

delete from table order by id ASC limit ((select count(*) from table ) - N)

Thanks.

Renie answered 23/2, 2009 at 18:58 Comment(0)
T
158

You cannot delete the records that way, the main issue being that you cannot use a subquery to specify the value of a LIMIT clause.

This works (tested in MySQL 5.0.67):

DELETE FROM `table`
WHERE id NOT IN (
  SELECT id
  FROM (
    SELECT id
    FROM `table`
    ORDER BY id DESC
    LIMIT 42 -- keep this many records
  ) foo
);

The intermediate subquery is required. Without it we'd run into two errors:

  1. SQL Error (1093): You can't specify target table 'table' for update in FROM clause - MySQL doesn't allow you to refer to the table you are deleting from within a direct subquery.
  2. SQL Error (1235): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery' - You can't use the LIMIT clause within a direct subquery of a NOT IN operator.

Fortunately, using an intermediate subquery allows us to bypass both of these limitations.


Nicole has pointed out this query can be optimised significantly for certain use cases (such as this one). I recommend reading that answer as well to see if it fits yours.

Tobias answered 23/2, 2009 at 19:13 Comment(7)
Okay that works -- but to me, it's inelegant and unsatisfying to have to resort to arcane tricks like that. +1 nevertheless for the answer.Thetos
I mark it as an accepted answer, because it does what I asked for. But I personally will do it probably in two queries just to keep it simple :) I thought maybe there was some quick and easy way.Renie
Thanks Alex, your answer helped me. I see that the intermediate subquery is required but I don't understand why. Do you have an explanation for that?Eurasian
a question: what is the "foo" for?Putsch
Perroloco, I tried without foo and got this error: ERROR 1248 (42000): Every derived table must have its own alias So theirs our answer, every derived table must have its own alias!Hagi
Cool, that worked for me, but is frickin' strange that you have two sub queries and that error for such.Collection
It took more than 10 minutes to run for me :(Miler
L
123

I know I'm resurrecting quite an old question, but I recently ran into this issue, but needed something that scales to large numbers well. There wasn't any existing performance data, and since this question has had quite a bit of attention, I thought I'd post what I found.

The solutions that actually worked were the Alex Barrett's double sub-query/NOT IN method (similar to Bill Karwin's), and Quassnoi's LEFT JOIN method.

Unfortunately both of the above methods create very large intermediate temporary tables and performance degrades quickly as the number of records not being deleted gets large.

What I settled on utilizes Alex Barrett's double sub-query (thanks!) but uses <= instead of NOT IN:

DELETE FROM `test_sandbox`
  WHERE id <= (
    SELECT id
    FROM (
      SELECT id
      FROM `test_sandbox`
      ORDER BY id DESC
      LIMIT 1 OFFSET 42 -- keep this many records
    ) foo
  );

It uses OFFSET to get the id of the Nth record and deletes that record and all previous records.

Since ordering is already an assumption of this problem (ORDER BY id DESC), <= is a perfect fit.

It is much faster, since the temporary table generated by the subquery contains just one record instead of N records.

Test case

I tested the three working methods and the new method above in two test cases.

Both test cases use 10000 existing rows, while the first test keeps 9000 (deletes the oldest 1000) and the second test keeps 50 (deletes the oldest 9950).

+-----------+------------------------+----------------------+
|           | 10000 TOTAL, KEEP 9000 | 10000 TOTAL, KEEP 50 |
+-----------+------------------------+----------------------+
| NOT IN    |         3.2542 seconds |       0.1629 seconds |
| NOT IN v2 |         4.5863 seconds |       0.1650 seconds |
| <=,OFFSET |         0.0204 seconds |       0.1076 seconds |
+-----------+------------------------+----------------------+

What's interesting is that the <= method sees better performance across the board, but actually gets better the more you keep, instead of worse.

Libbylibeccio answered 28/11, 2011 at 22:41 Comment(7)
I'm reading this thread again 4.5 years later. Nice addition!Tobias
Wow, this looks great but doesn't work in Microsoft SQL 2008. I get this message: "Incorrect syntax near 'Limit'. It's nice that it works in MySQL, but I'll need to find an alternative solution.Herzegovina
@KenPalmer You should be able to still find a specific row offset using ROW_NUMBER(): #604224Libbylibeccio
@KenPalmer use SELECT TOP instead of LIMIT when switching between SQL and mySQLAbreact
Cheers for that. It reduced the query on my (very large) data set from 12 minutes to 3.64 seconds!Damick
NicoleC for president! :DDisoblige
I wonder how this could be done with a group byGeometry
T
11

Unfortunately for all the answers given by other folks, you can't DELETE and SELECT from a given table in the same query.

DELETE FROM mytable WHERE id NOT IN (SELECT MAX(id) FROM mytable);

ERROR 1093 (HY000): You can't specify target table 'mytable' for update 
in FROM clause

Nor can MySQL support LIMIT in a subquery. These are limitations of MySQL.

DELETE FROM mytable WHERE id NOT IN 
  (SELECT id FROM mytable ORDER BY id DESC LIMIT 1);

ERROR 1235 (42000): This version of MySQL doesn't yet support 
'LIMIT & IN/ALL/ANY/SOME subquery'

The best answer I can come up with is to do this in two stages:

SELECT id FROM mytable ORDER BY id DESC LIMIT n; 

Collect the id's and make them into a comma-separated string:

DELETE FROM mytable WHERE id NOT IN ( ...comma-separated string... );

(Normally interpolating a comma-separate list into an SQL statement introduces some risk of SQL injection, but in this case the values are not coming from an untrusted source, they are known to be integer values from the database itself.)

note: Though this doesn't get the job done in a single query, sometimes a more simple, get-it-done solution is the most effective.

Thetos answered 23/2, 2009 at 19:8 Comment(5)
But you can do inner joins between a delete and select. What I did below should work.Morena
You need to use an intermediary subquery to get LIMIT working in the subquery.Tobias
@achinda99: I'm not seeing an answer from you on this thread...?Thetos
I got pulled for a meeting. My bad. I don't have a test environment right now to test the sql I wrote, but I've done both what Alex Barret did and I've gotten it to work with an inner join.Morena
It's a stupid limitation of MySQL. With PostgreSQL, DELETE FROM mytable WHERE id NOT IN (SELECT id FROM mytable ORDER BY id DESC LIMIT 3); works fine.Tolliver
A
9
DELETE  i1.*
FROM    items i1
LEFT JOIN
        (
        SELECT  id
        FROM    items ii
        ORDER BY
                id DESC
        LIMIT 20
        ) i2
ON      i1.id = i2.id
WHERE   i2.id IS NULL
Allogamy answered 23/2, 2009 at 19:3 Comment(0)
M
8

If your id is incremental then use something like

delete from table where id < (select max(id) from table)-N
Meant answered 23/2, 2009 at 19:3 Comment(2)
One big problem in this nice trick: serials are not always contiguous (for instance when there were rollbacks).Tolliver
One big issue: You can't specify target table 'table' for update in FROM clauseBridget
C
5

To delete all the records except te last N you may use the query reported below.

It's a single query but with many statements so it's actually not a single query the way it was intended in the original question.

Also you need a variable and a built-in (in the query) prepared statement due to a bug in MySQL.

Hope it may be useful anyway...

nnn are the rows to keep and theTable is the table you're working on.

I'm assuming you have an autoincrementing record named id

SELECT @ROWS_TO_DELETE := COUNT(*) - nnn FROM `theTable`;
SELECT @ROWS_TO_DELETE := IF(@ROWS_TO_DELETE<0,0,@ROWS_TO_DELETE);
PREPARE STMT FROM "DELETE FROM `theTable` ORDER BY `id` ASC LIMIT ?";
EXECUTE STMT USING @ROWS_TO_DELETE;

The good thing about this approach is performance: I've tested the query on a local DB with about 13,000 record, keeping the last 1,000. It runs in 0.08 seconds.

The script from the accepted answer...

DELETE FROM `table`
WHERE id NOT IN (
  SELECT id
  FROM (
    SELECT id
    FROM `table`
    ORDER BY id DESC
    LIMIT 42 -- keep this many records
  ) foo
);

Takes 0.55 seconds. About 7 times more.

Test environment: mySQL 5.5.25 on a late 2011 i7 MacBookPro with SSD

Cutpurse answered 2/10, 2013 at 8:12 Comment(0)
S
3
DELETE FROM table WHERE ID NOT IN
(SELECT MAX(ID) ID FROM table)
Scrutinize answered 23/2, 2009 at 19:2 Comment(2)
This will only leave single latest rowMeant
this is the best solution I reckon !Confucianism
A
1
DELETE FROM table WHERE id NOT IN (
    SELECT id FROM table ORDER BY id, desc LIMIT 0, 10
)
Arawakan answered 23/2, 2009 at 19:5 Comment(0)
K
1

try below query:

DELETE FROM tablename WHERE id < (SELECT * FROM (SELECT (MAX(id)-10) FROM tablename ) AS a)

the inner sub query will return the top 10 value and the outer query will delete all the records except the top 10.

Kitchen answered 14/5, 2015 at 6:44 Comment(1)
Some explanation as to how this works would be beneficial for those coming across this answer. Code dumping is usually not recommended.Puling
M
0

This should work as well:

DELETE FROM [table] 
INNER JOIN (
    SELECT [id] 
    FROM (
        SELECT [id] 
        FROM [table] 
        ORDER BY [id] DESC
        LIMIT N
    ) AS Temp
) AS Temp2 ON [table].[id] = [Temp2].[id]
Morena answered 23/2, 2009 at 20:18 Comment(0)
V
0

What about :

SELECT * FROM table del 
         LEFT JOIN table keep
         ON del.id < keep.id
         GROUP BY del.* HAVING count(*) > N;

It returns rows with more than N rows before. Could be useful ?

Vowelize answered 3/8, 2011 at 11:28 Comment(0)
C
0

Using id for this task is not an option in many cases. For example - table with twitter statuses. Here is a variant with specified timestamp field.

delete from table 
where access_time >= 
(
    select access_time from  
    (
        select access_time from table 
            order by access_time limit 150000,1
    ) foo    
)
Cowan answered 14/2, 2012 at 10:28 Comment(0)
H
0

Just wanted to throw this into the mix for anyone using Microsoft SQL Server instead of MySQL. The keyword 'Limit' isn't supported by MSSQL, so you'll need to use an alternative. This code worked in SQL 2008, and is based on this SO post. https://mcmap.net/q/74116/-how-to-select-the-nth-row-in-a-sql-database-table

-- Keep the last 10 most recent passwords for this user.
DECLARE @UserID int; SET @UserID = 1004
DECLARE @ThresholdID int -- Position of 10th password.
SELECT  @ThresholdID = UserPasswordHistoryID FROM
        (
            SELECT ROW_NUMBER()
            OVER (ORDER BY UserPasswordHistoryID DESC) AS RowNum, UserPasswordHistoryID
            FROM UserPasswordHistory
            WHERE UserID = @UserID
        ) sub
WHERE   (RowNum = 10) -- Keep this many records.

DELETE  UserPasswordHistory
WHERE   (UserID = @UserID)
        AND (UserPasswordHistoryID < @ThresholdID)

Admittedly, this is not elegant. If you're able to optimize this for Microsoft SQL, please share your solution. Thanks!

Herzegovina answered 2/4, 2015 at 13:0 Comment(0)
E
0

If you need to delete the records based on some other column as well, then here is a solution:

DELETE
FROM articles
WHERE id IN
    (SELECT id
     FROM
       (SELECT id
        FROM articles
        WHERE user_id = :userId
        ORDER BY created_at DESC LIMIT 500, 10000000) abc)
  AND user_id = :userId
Equivocal answered 23/6, 2016 at 12:40 Comment(0)
R
0

Stumbled across this and thought I'd update. This is a modification of something that was posted before. I would have commented, but unfortunately don't have 50 reputation...

LOCK Tables TestTable WRITE, TestTable as TestTableRead READ;
DELETE FROM TestTable
WHERE ID <= (
  SELECT ID
  FROM TestTable as TestTableRead -- (the 'as' declaration is required for some reason)
  ORDER BY ID DESC LIMIT 1 OFFSET 42 -- keep this many records);
UNLOCK TABLES;

The use of 'Where' and 'Offset' circumvents the sub-query. You also cannot read and write from the same table in the same query, as you may modify entries as they're being used. The Locks allow to circumvent this. This is also safe for parallel access to the database by other processes. For performance and further explanation see the linked answer.

Tested with mysql Ver 15.1 Distrib 10.5.18-MariaDB

For further details on locks, see here

Rosmunda answered 17/1, 2023 at 18:8 Comment(0)
H
-1

Answering this after a long time...Came across the same situation and instead of using the answers mentioned, I came with below -

DELETE FROM table_name order by ID limit 10

This will delete the 1st 10 records and keep the latest records.

Hsining answered 19/2, 2013 at 5:45 Comment(4)
The question asked "all exept the last N records" and "in a single query". But it seem you still need a first query to count all the records in the table a then limit to total - NCutpurse
@Cutpurse We do not require a query to count all records as the above query deletes all except last 10 records.Hsining
No, that query deletes the 10 oldest records. The OP wants to delete everything except the n most recent records. Yours is the basic solution that would be paired with a count query, while OP is asking if there's a way to combine everything into a single query.Eddaeddana
@Eddaeddana I agree. Shall I edit/delete this answer now to let users not down vote me or leave it as it is?Hsining
A
-2

Why not

DELETE FROM table ORDER BY id DESC LIMIT 1, 123456789

Just delete all but the first row (order is DESC!), using a very very large nummber as second LIMIT-argument. See here

Assimilable answered 23/2, 2009 at 19:40 Comment(1)
DELETE does not support [offset], or OFFSET: dev.mysql.com/doc/refman/5.0/en/delete.htmlLibbylibeccio

© 2022 - 2024 — McMap. All rights reserved.