Insert row if not exists without deadlock
Asked Answered
L

7

12

I have a simple table

CREATE TABLE test (
  col INT,
  data TEXT,
  KEY (col)
);

and a simple transaction

START TRANSACTION;

SELECT * FROM test WHERE col = 4 FOR UPDATE;

-- If no results, generate data and insert
INSERT INTO test SET col = 4, data = 'data';

COMMIT;

I am trying to ensure that two copies of this transaction running concurrently result in no duplicate rows and no deadlocks. I also don't want to incur the cost of generating data for col = 4 more than once.

I have tried:

  1. SELECT .. (without FOR UPDATE or LOCK IN SHARE MODE):

    Both transactions see that there are no rows with col = 4 (without acquiring a lock) and both generate data and insert two copies of the row with col = 4.

  2. SELECT .. LOCK IN SHARE MODE

    Both transactions acquire a shared lock on col = 4, generate data and attempt to insert a row with col = 4. Both transactions wait for the other to release their shared lock so it can INSERT, resulting in ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction.

  3. SELECT .. FOR UPDATE

    I would expect that one transaction's SELECT will succeed and acquire an exclusive lock on col = 4 and the other transaction's SELECT will block waiting for the first.

    Instead, both SELECT .. FOR UPDATE queries succeed and the transactions proceed to deadlock just like with SELECT .. LOCK IN SHARE MODE. The exclusive lock on col = 4 just doesn't seem to work.

How can I write this transaction without causing duplicate rows and without deadlock?

Ladonnalady answered 2/3, 2017 at 5:38 Comment(5)
Don't do this. Do set a UNIQUE index and use INSERT ... ON DUPLICATE KEY or INSERT IGNORE instead.Sent
@Sent If I do INSERT .. ON DUPLICATE KEY UPDATE then both transactions will incur the cost of generating data, unnecessarily. If I do INSERT IGNORE .. then all errors MySQL encounters while running the INSERT will be ignore (not just duplicate keys), which is pretty sloppy.Ladonnalady
What is the value of tx_isolation?Elephant
@RickJames REPEATABLE READ (SERIALIZABLE would just make the SELECT .. version behave the same as SELECT .. LOCK IN SHARE MODE)Ladonnalady
Good question. Never thought of it.Maze
S
0

Adjust your schema slightly:

CREATE TABLE test (
  col INT NOT NULL PRIMARY KEY,
  data TEXT
);

With col being a primary key it cannot be duplicated.

Then use the ON DUPLICATE KEY feature:

INSERT INTO test (col, data) VALUES (4, ...)
  ON DUPLICATE KEY UPDATE data=VALUES(data)
Sent answered 2/3, 2017 at 5:43 Comment(2)
If I do INSERT .. ON DUPLICATE KEY UPDATE then both transactions will incur the cost of generating data, unnecessarily. I only want one of them to proceed to generate data.Ladonnalady
Your example given in the question does no generating. It has a simple string. If you want to avoid duplicate generation, find out which IDs you need to generate data for and use some kind of queue to schedule those operations. There are a multitude of job-queue systems out there to pick from. Holding open a lock for extended periods of time is begging for deadlocks.Sent
E
0

Maybe this...

START TRANSACTION;
INSERT IGNORE INTO test (col, data) VALUES (4, NULL);  -- or ''
-- if Rows_affected() == 0, generate data and replace `data`
    UPDATE test SET data = 'data' WHERE col = 4;
COMMIT;

Caution: If the PRIMARY KEY is an AUTO_INCREMENT, this may 'burn' an id.

Elephant answered 4/3, 2017 at 0:37 Comment(3)
INSERT IGNORE ignores all errors and warnings, not just duplicate key errors, so it throws away my ability to detect and report other kinds of errors and warnings that might occur during that query.Ladonnalady
Hi I know this is an old question but I also find your answer here dba.stackexchange.com/questions/299728/… so do you mean insert ignore will not create a deadlock when 2 transactions try to update records in difference order. But why is that ?Mors
INSERT IGNORE won't "do the wrong thing" when it comes to transactional integrity". It just avoids complaining if the row is already in the table.Elephant
F
0

Note that InnoDB has 2 types of exclusive locks: one is for update and delete, and another one for insert. So to execute your SELECT FOR UPDATE transaction InnoDB will have to first take the lock for update in one transaction, then the second transaction will try to take the same lock and will block waiting for the first transaction (it couldn't have succeeded as you claimed in the question), then when first transaction will try to execute INSERT it will have to change its lock from the lock for update to the lock for insert. The only way InnoDB can do that is first downgrade the lock down to shared one and then upgrade it back to lock for insert. And it can't downgrade the lock when there's another transaction waiting to acquire the exclusive lock as well. That's why in this situation you get a deadlock error.

The only way for you to correctly execute this is to have unique index on col, try to INSERT the row with col = 4 (you can put dummy data if you don't want to generate it before the INSERT), then in case of duplicate key error rollback, and in case INSERT was successful you can UPDATE the row with the correct data. Note though that if you don't want to incur cost of generating data unnecessarily it probably means that generating it takes a long time, and all that time you'll hold an open transaction that inserted row with col = 4 which will hold all other processes trying to insert the same row hanging. I'm not sure that would be significantly better than generating data first and then inserting it.

Frock answered 17/3, 2017 at 19:5 Comment(1)
Have some formatting in your answer hereMaze
A
0

If you're goal is to have only one session insert the missing row, and any other sessions do nothing without even attempting an insert of DATA, then you need to either lock the entire table (which reduces your concurrency) or insert an incomplete row and follow it with an update.

A. create a primary key on column COL

Code:

begin
  insert into test values (4,null);
  update test set data = ... where col = 4;
  commit;
exception
  when dup_val_on_index then
    null;
end;

The first session that attempts the insert on col 4 will succeed and procede to the update where you can do the expensive calculation of DATA. Any other session trying to do this will raise a PK violation (-00001, or DUP_VAL_ON_INDEX) and go to the exception handler which traps it and does nothing (NULL). It will never reach the update statement, so won't do whatever expensive thing it is you do to calculate DATA.

Now, this will cause the other session to wait while the first session calculates DATA and does the update. If you don't want that wait, you can use NOWAIT to cause the lagging sessions to throw an exception immediately if the row is locked. If the row doesn't exist, that will also throw an exception, but a different one. Not great to use exception handling for normal code branches, but hey, it should work.

declare
  var_junk number;
begin
  begin
    select col into var_junk from test where col = 4 for update nowait;
  exception
    when no_data_found then
      insert into test values (col,null);
      update test set data = ... where col = 4;
      commit;
    when others then
      null;
  end;
end;
  
Afghanistan answered 29/12, 2022 at 17:52 Comment(0)
M
0

use for update then :

  • change isolation to READ COMMITED, can avoid gap lock
  • or change isolation to SERIALIZABLE
Mendes answered 21/3, 2023 at 5:8 Comment(0)
T
0

This issue for me stemmed from two statements being executed at the same time, so i just needed to ensure they start at different times. My application only has 2 input streams, so i implemented a varied delay before each execution (i.e. while loop with RAND()). Admittedly RAND() can cause the same number to be generated for both streams but the probability is small enough that it hasn't occured for me. I humbly present my extremely basic code for your consideration:

DECLARE @delay INT
DECLARE @cur INT = 0
SET @delay = 100 * RAND() + 1
WHILE (@cur < @delay) 
BEGIN
    SET @cur = @cur + 1
END
Turbine answered 28/6 at 0:4 Comment(0)
C
-1

here is the query that is right now used and working perfectly even at peak load,

INSERT INTO users (users_email, users_name) SELECT * FROM (SELECT "[email protected]" as abc, "user" as abd) AS tmp WHERE NOT EXISTS ( SELECT users_id FROM users WHERE users_email = "[email protected]" ) LIMIT 1;
Cheremkhovo answered 21/3, 2023 at 17:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.