MySQL Select... for update with index has concurrency issue
Asked Answered
J

2

3

This is a follow up on my previous question (you can skip it as I explain in this post the issue):
MySQL InnoDB SELECT...LIMIT 1 FOR UPDATE Vs UPDATE ... LIMIT 1

Environment:

  • JSF 2.1 on Glassfish
  • JPA 2.0 EclipseLink and JTA
  • MySQL 5.5 InnoDB engine

I have a table:

CREATE TABLE v_ext (
  v_id INT NOT NULL AUTO_INCREMENT,
  product_id INT NOT NULL,
  code VARCHAR(20),
  username VARCHAR(30),
  PRIMARY KEY (v_id)
) ENGINE=InnoDB DEFAULT CHARSET=UTF8;

It is populated with 20,000 records like this one (product_id is 54 for all records, code is randomly generated and unique, username is set to NULL):

v_id     product_id    code                  username
-----------------------------------------------------
1        54            '20 alphanumerical'   NULL
...
20,000   54            '20 alphanumerical'   NULL

When a user purchase product 54, he gets a code from that table. If the user purchases multiple times, he gets a code each times (no unique constraint on username). Because I am preparing for a high activity I want to make sure that:

  • No concurrency/deadlock can occur
  • Performance is not impacted by the locking mechanism which will be needed

From the SO question (see link above) I found that doing such a query is faster:

START TRANSACTION;
SELECT v_id FROM v_ext WHERE username IS NULL LIMIT 1 FOR UPDATE;
// Use result for next query
UPDATE v_ext SET username=xxx WHERE v_id=...;
COMMIT;

However I found a deadlock issue ONLY when using an index on username column. I thought of adding an index would help in speeding up a little bit but it creates a deadlock after about 19,970 records (actually quite consistently at this number of rows). Is there a reason for this? I don't understand. Thank you.

Jackscrew answered 27/12, 2012 at 9:5 Comment(2)
What's your transaction isolation level? Have you tried reducing it to read committed?Calycine
How can I get the TX isolation level. I am using JTA and I can't do em.getTransaction()Jackscrew
B
3

From a purely theoretical point of view, it looks like you are not locking the right rows (different condition in the first statement than in the update statement; besides you only lock one row because of LIMIT 1, whereas you possibly update more rows later on).

Try this:

START TRANSACTION;
SELECT v_id FROM v_ext WHERE username IS NULL AND v_id=yyy FOR UPDATE;
UPDATE v_ext SET username=xxx WHERE v_id=yyy;
COMMIT;

[edit]

As for the reason for your deadlock, this is the probable answer (from the manual):

If you have no indexes suitable for your statement and MySQL must scan the entire table to process the statement, every row of the table becomes locked (...)

Without an index, the SELECT ... FOR UPDATE statement is likely to lock the entire table, whereas with an index, it only locks some rows. Because you didn't lock the right rows in the first statement, an additional lock is acquired during the second statement.

Obviously, a deadlock cannot happen if the whole table is locked (i.e. without an index). A deadlock can certainly occur in the second setup.

Blazon answered 27/12, 2012 at 10:15 Comment(7)
I don't have id. I only have the primary key that could be used. I need to update one and only one record, and the condition is "any one row with username null".Jackscrew
@Jackscrew Then you should update only the record identified by v_id from the first query.Calycine
That's what I am doing and this creates a deadlock when username is indexed.Jackscrew
@Jack Ok, I see my typo in my post. I updated the update query id --> v_id. Sorry for the confusion.Jackscrew
@Jackscrew I also changed id to v_id. My remark still applies: you should use the same condition in both statements. As it stands in your current code, you may sometimes update rows which are not locked by the first statement.Blazon
Makes perfectly sense :) Thank you for not giving up ( +1)Jackscrew
@Blazon I'm trying to understand this; how can he possibly use the same condition in both queries, when he does not know the v_id at the time of the first query?Mountford
T
1

First of all, the definition of the table is wrong. You have no tid column in the table, so i am suspecting the primary key is v_id.
Second of all, if you select for update, you lock the row. Any other select coming until the first transaction is done will wait for the row to be cleared, because it will hit the exact same record. So you will have waits for this row.
However, i pretty much doubt this can be a real serious problem in your case, because first of all, you have the username there, and second of all you have the product id there. It is extremly unlikely that you will have alot of hits on that exact same record you hit initially, and even if you do, the transaction should be running very fast.
You have to understand that by using transactions, you usually give up pretty much on concurrency for consistent data. There is no way to support consistency of data and concurrency at the same time.

Tati answered 27/12, 2012 at 14:56 Comment(2)
Thanks for noting the tid. Though it was not my issue (the table names and fields in my code are mostly different, I just tried to simplify the names in this post). Also, product_id is the same for the 20,000 records. Only code (which is unique across 20,000 rows) is different but I can't query on that since it is the value I need to get. But I need to get it with 0% chance that 2+ queries return that same code in case of concurrency. I don't need 'full concurrency'. I need to be able to serialize (by code or DB locking way) with low impact on performance so to obtain near concurrency.Jackscrew
then locking the row should not pose a problem. Unless the update takes a very long time. And locking the record is basicly your only way, so i would go for that solution.Tati

© 2022 - 2024 — McMap. All rights reserved.