I am trying to implement a task assignment system. Users can request tasks from a pool. Even though set to SERIALIZABLE, the transaction sometimes gives the same task to multiple users even when it shouldn’t.
Simplified Schema:
CREATE TABLE tasks(
_id CHAR(24) PRIMARY KEY,
totalInstances BIGINT NOT NULL
);
CREATE TABLE assigned(
_id CHAR(24) PRIMARY KEY,
_task CHAR(24) NOT NULL
);
The tasks table is filled with many rows, let’s say each has totalInstances = 1
, meaning each task should be assigned at most once.
Query to add a row in assigned
:
WITH task_instances AS (
SELECT t._id, t.totalInstances - COUNT(assigned._id) openInstances
FROM tasks t
LEFT JOIN assigned ON t._id = assigned._task
GROUP BY t._id, t.totalInstances
),
selected_task AS (
SELECT _id
FROM task_instances
WHERE openInstances > 0
LIMIT 1
)
INSERT INTO assigned(_id, _task)
SELECT $1, _id
FROM selected_task;
with $1
being a random id passed to each query.
Symptoms
We have about 100 active users reqularly requesting tasks. This works as expected, except maybe once in 1000 requests.
Then, two assigned
rows are created for the same _task
id upon parallel requests. I would expect a serializable execution to roll back the second one, since the openInstances should have been decreased to 0 by the first one.
Setup
We use Postgres 10.3 and the query is run from Scala code via Slick 3.2.3 with withTransactionIsolation(Serializable)
. No other queries delete from or insert into the assigned
table.
Postgres logs show that the requests are run in different sessions, and that SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL SERIALIZABLE;
is executed before each task assignment query.
I tried rewriting the query in different styles, inculding the usage of VIEW
s for the WITH
subqueries, and surrounding the query with BEGIN
and COMMIT
but to no effect.
Any help is appreciated.
Edit
I should add that sometimes the expected serialization errors/rollbacks do come up, upon which our application retries the query. I see this correct behavior 10 times in the logs of the last hours, but 2 times it still wrongly assigned the same task twice, as described above.