Atomic read and update in MySQL with concurrent workers

P

4

6

Say I have multiple workers that can concurrently read and write against a MySQL table (e.g. jobs). The task for each worker is:

Find the oldest QUEUED job
Set it's status to RUNNING
Return the corresponding ID.

Note that there may not be any qualifying (i.e. QUEUED) jobs when a worker runs step #1.

I have the following pseudo-code so far. I believe I need to cancel (ROLLBACK) the transaction if step #1 returns no jobs. How would I do that in the code below?

BEGIN TRANSACTION;

# Update the status of jobs fetched by this query:
SELECT id from jobs WHERE status = "QUEUED" 
ORDER BY created_at ASC LIMIT 1;

# Do the actual update, otherwise abort (i.e. ROLLBACK?)
UPDATE jobs
SET status="RUNNING"
# HERE: Not sure how to make this conditional on the previous ID
# WHERE id = <ID from the previous SELECT>

COMMIT;

Paradox answered 15/6, 2020 at 18:24 Comment(6)

Under what circumstances would there be an 'abort' !?!?!? – Carabao 15/6, 2020 at 18:27

It looks like bad idea from the start. Why don't you delegate this job to your end-application ? Its it that should define if something is queued or running, not the database – Mahaliamahan 15/6, 2020 at 18:29

Looks like you rather need a single UPDATE statement with ORDER BY and LIMIT. – Pantelegraph 15/6, 2020 at 18:53

@Carabao I'm interested in canceling the UPDATE if there are no jobs with the status QUEUED. – Paradox 15/6, 2020 at 19:21

Thanks @ThomasG Not sure I follow. Are you questioning the goal of the question itself? If it helps for context, I'm asking this question as a personal exercise to see how to implement a queue on a DB and thus keeping track (and changing status) of what jobs are waiting to be processed on the DB. I don't see any problem with the goal of the question itself, but if you think this is off-topic or equivalent, please let me know. – Paradox 15/6, 2020 at 19:23

Thanks @PaulSpiegel I'd be curious to see how you would write that (in a single UPDATE statement), although TBH I'm also interested in how to condition the UPDATE statement to the result of a separate SELECT statement in a transaction and thus atomic operation. – Paradox 15/6, 2020 at 19:25

L

3

I am implementing something very similar to your case this week. A number of workers, each grabbing the "next" row in a set of rows to work on.

The pseudocode is something like this:

BEGIN;

SELECT ID INTO @id FROM mytable WHERE status = 'QUEUED' LIMIT 1 FOR UPDATE;

UPDATE mytable SET status = 'RUNNING' WHERE id = @id;

COMMIT;

Using FOR UPDATE is important to avoid race conditions, i.e. more than one worker trying to grab the same row.

See https://dev.mysql.com/doc/refman/8.0/en/select-into.html for information about SELECT ... INTO.

Liquorice answered 16/6, 2020 at 1:21 Comment(8)

Thanks - I didn't know one needs FOR UPDATE if it's all part of a transaction. What could happen without FOR UPDATE? Also, if we use FOR UPDATE, what's the point of wrapping it all in a transaction? – Paradox 16/6, 2020 at 2:33

SELECT does no locking without FOR UPDATE or FOR SHARE (the latter is called LOCK IN SHARE MODE before MySQL 8.0). This would allow multiple sessions to read the same id value and try to lock it. By using a locking read you ensure that only one session locks a given row. The others will lock a different row. You still need a transaction because the locks will be released immediately if you use autocommit. Read dev.mysql.com/doc/refman/8.0/en/… – Liquorice 16/6, 2020 at 6:11

Thanks. Very helpful Bill. – Paradox 16/6, 2020 at 15:29

If a worker crashes while doing the job? It will left job stuck in status RUNNING forever. Any idea to avoid this? – Electric 14/12, 2022 at 10:36

@MarcoMarsala You can't avoid it, but you can detect and retry an orphaned job. Run a check periodically for jobs in RUNNING state that haven't been updated in too long. For example, if you expect no job should take more than 10 minutes, then a job in RUNNING for more than 20 minutes is certainly a victim of a crash. – Liquorice 14/12, 2022 at 14:23

@BillKarwin - Isn't there a tricky way to use LAST_INSERT_ID() to turn your transaction into a single IODKU? – Respondence 16/12, 2022 at 19:49

@RickJames I don't have any idea that comes to mind for the purpose in this question. Are you thinking of simulating a sequence? That's in the docs on the LAST_INSERT_ID() function, but I don't understand how it would be used in this case. – Liquorice 16/12, 2022 at 19:59

@BillKarwin - The idea is to (1) assign the task to yourself and (2) retrieve the id in a single IODKU. (Maybe I am thinking of a related task, and not the specific action the OP is asking about.) – Respondence 16/12, 2022 at 21:55

P

3

It's still not quite clear what you are after. But assuming your task is: Find the next QUEUED job. Set it's status to RUNNING and select the corresponding ID.

In a single threaded environment, you can just use your code. Fetch the selected ID into a variable in your application code and pass it to the UPDATE query in the WHERE clause. You don't even need a transaction, since there is only one writing statement. You can mimic in an SQLscript.

Assuming this is your current state:

| id  | created_at          | status   |
| --- | ------------------- | -------- |
| 1   | 2020-06-15 12:00:00 | COMLETED |
| 2   | 2020-06-15 12:00:10 | QUEUED   |
| 3   | 2020-06-15 12:00:20 | QUEUED   |
| 4   | 2020-06-15 12:00:30 | QUEUED   |

You want to start the next queued job (which has id=2).

SET @id_for_update = (
  SELECT id
  FROM jobs
  WHERE status = 'QUEUED'
  ORDER BY id
  LIMIT 1
);

UPDATE jobs
SET status="RUNNING"
WHERE id = @id_for_update;

SELECT @id_for_update;

You will get

@id_for_update
2

from the last select. And the table will have this state:

| id  | created_at          | status   |
| --- | ------------------- | -------- |
| 1   | 2020-06-15 12:00:00 | COMLETED |
| 2   | 2020-06-15 12:00:10 | RUNNING  |
| 3   | 2020-06-15 12:00:20 | QUEUED   |
| 4   | 2020-06-15 12:00:30 | QUEUED   |

View on DB Fiddle

If you have multiple processes, which start jobs, you would need to lock the row with FOR UPDATE. But that can be avoided using LAST_INSERT_ID():

Starting from the state above, with job 2 already running:

UPDATE jobs
SET status = 'RUNNING',
    id = LAST_INSERT_ID(id)
WHERE status = 'QUEUED'
ORDER BY id
LIMIT 1;

SELECT LAST_INSERT_ID();

You will get:

| LAST_INSERT_ID() | ROW_COUNT() |
| ---------------- | ----------- |
| 3                | 1           |

And the new state is:

| id  | created_at          | status   |
| --- | ------------------- | -------- |
| 1   | 2020-06-15 12:00:00 | COMLETED |
| 2   | 2020-06-15 12:00:10 | RUNNING  |
| 3   | 2020-06-15 12:00:20 | RUNNING  |
| 4   | 2020-06-15 12:00:30 | QUEUED   |

View on DB Fiddle

If the UPDATE statement affected no row (there were no queued rows) ROW_COUNT() will be 0.

There might be some risks, which I am not aware of - But this is also not really how I would approach this. I would rather store more information in the jobs table. Simple example:

CREATE TABLE jobs (
  id INT auto_increment primary key,
  created_at timestamp not null default now(),
  updated_at timestamp not null default now() on update now(),
  status varchar(50) not null default 'QUEUED',
  process_id varchar(50) null default null
);

and

UPDATE jobs
SET status = 'RUNNING',
    process_id = 'some_unique_pid'    
WHERE status = 'QUEUED'
ORDER BY id
LIMIT 1;

Now a running job belongs to a specific process and you can just select it with

SELECT * FROM jobs WHERE process_id = 'some_unique_pid';

You might even like to have more information - eg. queued_at, started_at, finished_at.

Pantelegraph answered 15/6, 2020 at 21:14 Comment(5)

It's late so I'm almost certainly missing something, but is any of this actually necessary? – Carabao 15/6, 2020 at 21:52

Thanks Paul - Apologies if my question wasn't clear. I updated the OP. I do have multiple workers that would attempt to concurrently read and write against this table though. Would your answer still work in such scenario? I was under the impression that one 1) should do a transaction / commit to prevent race conditions in this case and 2) cancel it with a rollback if e.g. we don't find any qualifying jobs. – Paradox 16/6, 2020 at 2:33

1) I wrote in my answer, that you will need to lock the row, when running concurrent processes. A transaction alone doesn't lock. But you need it to keep a (row level) lock until you update the status. But there are also ways to avoid explicit locking - See the other two examples. 2) If there is no QUEUED job, @id_for_update will be NULL and the following UPDATE statement will affect no row (see fiddle). So there is nothing to abbort/rollback. – Pantelegraph 16/6, 2020 at 7:24

@Carabao What do you mean by "necessary". At least the first example is kind of "basic stuff": Get some info with a SELECT query. Use it in the next query. – Pantelegraph 16/6, 2020 at 7:27

A SELECT is normally for displaying information to an end user. Otherwise the concept of 'first query, next query' is surely rather meaningless – Carabao 16/6, 2020 at 7:31

L

3

I am implementing something very similar to your case this week. A number of workers, each grabbing the "next" row in a set of rows to work on.

The pseudocode is something like this:

BEGIN;

SELECT ID INTO @id FROM mytable WHERE status = 'QUEUED' LIMIT 1 FOR UPDATE;

UPDATE mytable SET status = 'RUNNING' WHERE id = @id;

COMMIT;

Using FOR UPDATE is important to avoid race conditions, i.e. more than one worker trying to grab the same row.

See https://dev.mysql.com/doc/refman/8.0/en/select-into.html for information about SELECT ... INTO.

Liquorice answered 16/6, 2020 at 1:21 Comment(8)

Thanks - I didn't know one needs FOR UPDATE if it's all part of a transaction. What could happen without FOR UPDATE? Also, if we use FOR UPDATE, what's the point of wrapping it all in a transaction? – Paradox 16/6, 2020 at 2:33

SELECT does no locking without FOR UPDATE or FOR SHARE (the latter is called LOCK IN SHARE MODE before MySQL 8.0). This would allow multiple sessions to read the same id value and try to lock it. By using a locking read you ensure that only one session locks a given row. The others will lock a different row. You still need a transaction because the locks will be released immediately if you use autocommit. Read dev.mysql.com/doc/refman/8.0/en/… – Liquorice 16/6, 2020 at 6:11

Thanks. Very helpful Bill. – Paradox 16/6, 2020 at 15:29

If a worker crashes while doing the job? It will left job stuck in status RUNNING forever. Any idea to avoid this? – Electric 14/12, 2022 at 10:36

@MarcoMarsala You can't avoid it, but you can detect and retry an orphaned job. Run a check periodically for jobs in RUNNING state that haven't been updated in too long. For example, if you expect no job should take more than 10 minutes, then a job in RUNNING for more than 20 minutes is certainly a victim of a crash. – Liquorice 14/12, 2022 at 14:23

@BillKarwin - Isn't there a tricky way to use LAST_INSERT_ID() to turn your transaction into a single IODKU? – Respondence 16/12, 2022 at 19:49

@RickJames I don't have any idea that comes to mind for the purpose in this question. Are you thinking of simulating a sequence? That's in the docs on the LAST_INSERT_ID() function, but I don't understand how it would be used in this case. – Liquorice 16/12, 2022 at 19:59

@BillKarwin - The idea is to (1) assign the task to yourself and (2) retrieve the id in a single IODKU. (Maybe I am thinking of a related task, and not the specific action the OP is asking about.) – Respondence 16/12, 2022 at 21:55

E

0

Adding SKIP LOCKED to the SELECT query, and putting in a SQL transaction, committed when the job is done, avoid jobs stuck in status RUNNING if a worker crashes (because the uncommitted transaction will rollback). It's now supported in newest versions of most common DBMS.

See:

Select only unlocked rows mysql

https://dev.mysql.com/doc/refman/8.0/en/innodb-locking-reads.html#innodb-locking-reads-nowait-skip-locked

Electric answered 14/12, 2022 at 10:53 Comment(3)

Do not use an InnoDB transaction for any but very short tasks. The lock_wait_timeout default of 50 seconds is much too long. – Respondence 16/12, 2022 at 21:45

With SKIP LOCKED shouldn’t be an issue, it isn’t? – Electric 18/12, 2022 at 14:28

Since I don't understand what cases can safely use SKIP LOCKED, I avoid it. – Respondence 18/12, 2022 at 16:8

R

0

(This is not an answer to the question, but a list of caveats that you need to be aware of when using any of the real Answers. Some of these have already been mentioned.)

Replication -- You must do all the locking on the Primary. If you are using a cluster with multiple writable nodes, be aware of the inter-node delays.
Backlog -- When something breaks, you could get a huge list of tasks in the queue. This may lead to some ugly messes.
Number of 'workers' -- Don't have more than a "few" workers. If you try to have, say, 100 concurrent workers, they will stumble over each other an cause nasty problems.
Reaper -- Since a worker may crash, the task assigned to it may never get cleared. Have a TIMESTAMP on the rows so a separate (cron/EVENT/whatever) job can discover what tasks are long overdue and clear them.
If the tasks are fast enough, then the overhead of the queue could be a burden. That is, "Don't queue it, just do it."
You are right to grab the task in one transaction, then later release the task in a separate transaction. Using InnoDB's locking is folly for any be trivially fast actions.

Respondence answered 16/12, 2022 at 19:59 Comment(0)

Recommended topics

Hot tags