How do I lock on an InnoDB row that doesn't exist yet?
Asked Answered
P

5

39

How can I guarantee that I can search if a username exists in my database, then insert that username into the database as a new row without any intercept between the SELECT and INSERT statements?

Almost as if I am locking on a row that doesn't exist. I want to lock on the non-existent row with the username "Foo", so that I can now check if it exists in the database AND insert it into the database if it doesn't already exist without any interruption.

I know that using LOCK IN SHARE MODE and FOR UPDATE exist but as far as I know, that only works on rows that already exist. I am not sure what to do in this situation.

Parch answered 12/6, 2013 at 14:55 Comment(0)
D
27

If there is an index on username (which should be the case, if not, add one, and preferably a UNIQUE one), then issuing a SELECT * FROM user_table WHERE username = 'foo' FOR UPDATE; will prevent any concurrent transaction from creating this user (as well as the "previous" and the "next" possible value in case of a non-unique index).

If no suitable index is found (to meet the WHERE condition), then an efficient record-locking is impossible and the whole table becomes locked*.

This lock will be held until the end of the transaction that issued the SELECT ... FOR UPDATE.

Some very interesting information on this topic can be found in these manual pages.

* I say efficient, because in fact a record lock is actually a lock on index records. When no suitable index is found, only the default clustered index can be used, and it will be locked in full.

Dicast answered 12/6, 2013 at 15:23 Comment(10)
What if I am adding a new row for something that doesn't have an index? Does it lock the whole table without an index?Parch
Yes, I forgot to mention that. The whole table becomes read-only if no suitable index is found for a record-lock.Dicast
What if there isn't a UNIQUE index on the username column but instead a PRIMARY on the user_id column. Would this still make it sure that the username 'foo' is locked if I search for it with FOR UPDATE? Is there any pitfalls to this that I might be complete unaware of? E.g. if user_id and username both have indexes at the same time (PRIMARY and UNIQUE) then locking is impossible and will error out or something? I'm still quite new to this so apologies if that is a somewhat stupid question. Thanks for the reply!Parch
It all depends on the filtering conditions used in the SELECT ... FOR UPDATE statement. If an index can be used (think EXPLAIN) then this one will be used for locking. If not, the entire table becomes locked in effect. MySQL is quite conservative with the FOR UPDATE statement. This operation is 100% safe, whatever the indexes on the table. Just mind that you may lock the entire table when you think you just lock one row.Dicast
If the index is on user_id but I am searching the username column, will it still lock that username using the user_id or is this a scenario where the whole table would be locked?Parch
Yes, this should lock the entire table. See the execution plan: no index would be used by such a query. You should fiddle with your data and check for yourself. Try various combinations, you can monitor the locks by issuing a SHOW ENGINE INNODB STATUS from a second console. Check for the section "TRANSACTIONS". innotop is another very interesting tool.Dicast
So the general idea is if you're going to be checking for the existence of something before adding it, like a username, it's best to add a UNIQUE index to prevent the entire table being locked and blocking concurrent access? Am I on the right track?Parch
This didn't appear to work for me. I tested by starting two transactions, performing the "SELECT ... FOR UPDATE" in transaction 1, then inserting a record in transaction 2, and found that transaction 2 was not blocked. The only solution I've found so far is to preface each DELETE statement with an appropriate INSERT statement, to make sure that a row exists (and thus can be locked) before I perform the DELETE. FWIW, my DB is in READ_COMMITTED mode.Fragrant
In transaction 2, are you INSERT'ing or DELETE'ing a record? Only an INSERT would be blocked. A DELETE on a non-existing record will obviously not be blocked, as it does nothing.Dicast
Not sure why this becomes a selected answer, but the for the case here it won't work with SELECT ... FOR UPDATE. Get dead lockWellgrounded
B
37

While the answer above is true in that a SELECT ... FOR UPDATE will prevent concurrent sessions / transactions from inserting the same record, that is not the full truth. I am currently fighting with the same problem and have come to the conclusion that the SELECT ... FOR UPDATE is nearly useless in that situation for the following reason:

A concurrent transaction / session can also do a SELECT ... FOR UPDATE on the very same record / index value, and MySQL will happily accept that immediately (non-blocking) and without throwing errors. Of course, as soon as the other session has done that, your session as well can't insert the record any more. Nor your nor the other session / transaction get any information about the situation and think they can safely insert the record until they actually try to do so. Trying to insert then either leads to a deadlock or to a duplicate key error, depending on circumstances.

In other words, SELECT ... FOR UPDATE prevents other sessions from inserting the respective record(s), BUT even if you do a SELECT ... FOR UPDATE and the respective record is not found, chances are that you can't actually insert that record. IMHO, that renders the "first query, then insert" method useless.

The cause of the problem is that MySQL does not offer any method to really lock non-existent records. Two concurrent sessions / transactions can lock non-existent records "FOR UPDATE" at the same time, a thing which really should not be possible and which makes development significantly more difficult.

The only way to work around this seems to be using semaphore tables or locking the whole table when inserting. Please refer to the MySQL documentation for further reference on locking whole tables or using semaphore tables.

Just my 2 cents ...

Burchell answered 2/7, 2015 at 12:9 Comment(10)
Another option, though not necessarily ideal in all circumstances is to skip the SELECT ... FOR UPDATE and just do an INSERT and then handle the resulting duplicate key error (which in my experience is much more consistent when the insert is the first operation performed). I'm sure there is a performance penalty, however, in a lot of cases it is negligible compared to the other operations performed and can save you the trouble of having to create a mutex table.Shawnna
@Shawnna In my applications, getting an error from the database always means that there is a problem with the database or with wrong (SQL) code, but it never means a problem with the data itself. This is an architectural decision I have made a long time ago for serious reasons. For example, when doing it your way, I would have to check the error number which comes back from the database (to determine what sort of error it actually is) at every respective place, I would have to track MySQL's further development to make sure that error numbers don't change, and porting would be difficult.Burchell
"INSERT ... ON DUPLICATE KEY UPDATE" consistently generates an X lock. This can be used to prevent two processes to enter the transaction. See https://mcmap.net/q/409458/-innodb-locking-for-insert-update-concurrent-transactionsArmagnac
Yes, but what we are talking about here are the situations where you want to avoid this under all circumstances. As a very simple example, imagine a user database where the email address must be unique. Do you really want the data of an already registered user with a certain email address to be overwritten by somebody who is just trying to register using the same email address? I strongly suppose that the answer is "no" ...Burchell
I am facing the same issue, I think both transaction can try to get an advisory lock (dev.mysql.com/doc/refman/5.7/en/locking-functions.html) after select for update returns no rows and before doing an insertion. As suggested in the answer locking whole tables or using semaphore tables works too.Falsecard
Yes, advisory locks as described in the link you gave would work as well. I have no clue if this was already possible in 2013 (when the OP asked) or 2015 (when I answered). But as a matter of fact, GET_LOCK() was very restricted in versions before 5.7.; personally, I wouldn't have used it with such versions. And by the way, it's a shame that neither MySQL nor MariaDB, which claims to be MySQL's advanced, much-improved successor and drop-in-replacement, still did not solve that problem; the first bug reports naming it are from 2011 (AFAIK).Burchell
Actually I am facing this issue with mysql version 5.7.26, not sure if the issue is fixed in newer versions. What restrictions are you referring to with locking functions in 5.7.x versions ?Falsecard
I am referring to the restrictions and deviating bevaviour which are described in detail in the link you gave in your previous comment. That link lists the differences between 5.7+ versions and pre-5.7 versions in depth.Burchell
In MySql changing of transaction isolation level from default REPEATABLE_READ (that is unnecessary for most cases) to READ_COMMITED helps with the pattern of SELECT ... FOR UPDATE then INSERT or UPDATE. In READ_COMMITED isolation level SELECT ... FOR UPDATE doesn't lock the gap, so both transactions will be allowed to INSERT and if this is really the same PK one of them will throw duplicate key error. But if they are 2 different entries that happen to fall under the same next-key-lock gap, both INSERTs will succeed. And deadlock will never happen.Layette
Thanks for your hint. However, the whole problem is about absolutely and definitely making sure in advance that a certain record can be inserted without errors, before trying to insert. That is, we don't want to encounter errors under any circumstances; especially, we never want to encounter a duplicate key error. Relaxing the transaction level unfortunately does not solve that problem.Burchell
D
27

If there is an index on username (which should be the case, if not, add one, and preferably a UNIQUE one), then issuing a SELECT * FROM user_table WHERE username = 'foo' FOR UPDATE; will prevent any concurrent transaction from creating this user (as well as the "previous" and the "next" possible value in case of a non-unique index).

If no suitable index is found (to meet the WHERE condition), then an efficient record-locking is impossible and the whole table becomes locked*.

This lock will be held until the end of the transaction that issued the SELECT ... FOR UPDATE.

Some very interesting information on this topic can be found in these manual pages.

* I say efficient, because in fact a record lock is actually a lock on index records. When no suitable index is found, only the default clustered index can be used, and it will be locked in full.

Dicast answered 12/6, 2013 at 15:23 Comment(10)
What if I am adding a new row for something that doesn't have an index? Does it lock the whole table without an index?Parch
Yes, I forgot to mention that. The whole table becomes read-only if no suitable index is found for a record-lock.Dicast
What if there isn't a UNIQUE index on the username column but instead a PRIMARY on the user_id column. Would this still make it sure that the username 'foo' is locked if I search for it with FOR UPDATE? Is there any pitfalls to this that I might be complete unaware of? E.g. if user_id and username both have indexes at the same time (PRIMARY and UNIQUE) then locking is impossible and will error out or something? I'm still quite new to this so apologies if that is a somewhat stupid question. Thanks for the reply!Parch
It all depends on the filtering conditions used in the SELECT ... FOR UPDATE statement. If an index can be used (think EXPLAIN) then this one will be used for locking. If not, the entire table becomes locked in effect. MySQL is quite conservative with the FOR UPDATE statement. This operation is 100% safe, whatever the indexes on the table. Just mind that you may lock the entire table when you think you just lock one row.Dicast
If the index is on user_id but I am searching the username column, will it still lock that username using the user_id or is this a scenario where the whole table would be locked?Parch
Yes, this should lock the entire table. See the execution plan: no index would be used by such a query. You should fiddle with your data and check for yourself. Try various combinations, you can monitor the locks by issuing a SHOW ENGINE INNODB STATUS from a second console. Check for the section "TRANSACTIONS". innotop is another very interesting tool.Dicast
So the general idea is if you're going to be checking for the existence of something before adding it, like a username, it's best to add a UNIQUE index to prevent the entire table being locked and blocking concurrent access? Am I on the right track?Parch
This didn't appear to work for me. I tested by starting two transactions, performing the "SELECT ... FOR UPDATE" in transaction 1, then inserting a record in transaction 2, and found that transaction 2 was not blocked. The only solution I've found so far is to preface each DELETE statement with an appropriate INSERT statement, to make sure that a row exists (and thus can be locked) before I perform the DELETE. FWIW, my DB is in READ_COMMITTED mode.Fragrant
In transaction 2, are you INSERT'ing or DELETE'ing a record? Only an INSERT would be blocked. A DELETE on a non-existing record will obviously not be blocked, as it does nothing.Dicast
Not sure why this becomes a selected answer, but the for the case here it won't work with SELECT ... FOR UPDATE. Get dead lockWellgrounded
N
13

Locking on nonexistent record does not work in MySQL. There are several bug reports about it:

One workaround is to use a mutex table, where an existing record will be locked before the new record is inserted. For example, there are two tables: sellers and products. A seller has many products, but should not have any duplicate products. In this case, sellers table can be used as mutex table. Before a new product is inserted, a lock will be created on the seller’s record. With this additional query, it is guaranteed that only one thread can perform the action at any given time. No duplicate. No deadlock.

Numerator answered 12/6, 2016 at 3:56 Comment(0)
C
2

You are "normalizing"? That is, the table is a list of pairs of ids and names? And you are inserting a new "name" (and presumably want the id for use in other tables)?

Then have UNIQUE(name) and do

INSERT IGNORE INTO tbl (name) VALUES ($name);

That does not explain how to the the id just created, but you did not ask about that.

Be aware that the "new" id is allocated before discovering whether it is needed. So this could lead to rapidly increasing AUTO_INCREMENT values.

See also

 INSERT ... ON DUPLICATE KEY UPDATE ...

and tricks to use with VALUES() and LAST_INSERT_ID(id). But, again, you have not stated the real purpose in the Question, so I don't want to unnecessarily branch into further details.

Note: The above don't care what the value of autocommit or whether the statement is inside an explicit transaction.

For normalizing a batch of 'names' all at once, the 2 SQLs given here are quite efficient: http://mysql.rjweb.org/doc.php/staging_table#normalization And the technique avoids 'burning' ids and avoids any runtime errors.

Clipfed answered 11/11, 2019 at 20:25 Comment(0)
D
0

Not answering the question directly, but wouldn't the end goal be achievable using Serializable isolation level? Assuming the end goal is to avoid duplicate names. From Hermitage:

MySQL "serializable" prevents Anti-Dependency Cycles (G2):

set session transaction isolation level serializable; begin; -- T1
set session transaction isolation level serializable; begin; -- T2
select * from test where value % 3 = 0; -- T1
select * from test where value % 3 = 0; -- T2
insert into test (id, value) values(3, 30); -- T1, BLOCKS
insert into test (id, value) values(4, 42); -- T2, prints "ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction"
commit;   -- T1
rollback; -- T2
Dispersant answered 26/8, 2018 at 3:28 Comment(1)
Well, the goal is a little bit more than just preventing duplicates. We want to prevent the duplicates and associated errors before they happen. That is, we are looking for a clean method which enables us to safely insert a row without runtime errors happening (like deadlocks, duplicate keys and so on).Burchell

© 2022 - 2024 — McMap. All rights reserved.