Postgresql Serializable Transaction not working as expected
Asked Answered
M

2

9

I am trying to implement a task assignment system. Users can request tasks from a pool. Even though set to SERIALIZABLE, the transaction sometimes gives the same task to multiple users even when it shouldn’t.

Simplified Schema:

CREATE TABLE tasks(
  _id CHAR(24) PRIMARY KEY,
  totalInstances BIGINT NOT NULL
);

CREATE TABLE assigned(
  _id CHAR(24) PRIMARY KEY,
  _task CHAR(24) NOT NULL
);

The tasks table is filled with many rows, let’s say each has totalInstances = 1, meaning each task should be assigned at most once.

Query to add a row in assigned:

WITH task_instances AS (
  SELECT t._id, t.totalInstances - COUNT(assigned._id) openInstances
  FROM tasks t
  LEFT JOIN assigned ON t._id = assigned._task
  GROUP BY t._id, t.totalInstances
),

selected_task AS (
  SELECT _id
  FROM task_instances
  WHERE openInstances > 0
  LIMIT 1
)

INSERT INTO assigned(_id, _task)
SELECT $1, _id
FROM selected_task;

with $1 being a random id passed to each query.

Symptoms

We have about 100 active users reqularly requesting tasks. This works as expected, except maybe once in 1000 requests. Then, two assigned rows are created for the same _task id upon parallel requests. I would expect a serializable execution to roll back the second one, since the openInstances should have been decreased to 0 by the first one.

Setup

We use Postgres 10.3 and the query is run from Scala code via Slick 3.2.3 with withTransactionIsolation(Serializable). No other queries delete from or insert into the assigned table.

Postgres logs show that the requests are run in different sessions, and that SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL SERIALIZABLE; is executed before each task assignment query.

I tried rewriting the query in different styles, inculding the usage of VIEWs for the WITH subqueries, and surrounding the query with BEGIN and COMMIT but to no effect.

Any help is appreciated.

Edit

I should add that sometimes the expected serialization errors/rollbacks do come up, upon which our application retries the query. I see this correct behavior 10 times in the logs of the last hours, but 2 times it still wrongly assigned the same task twice, as described above.

Malikamalin answered 25/4, 2018 at 10:50 Comment(0)
P
5

Serializable isolation level does not mean that transactions are literally serial. It only guarantees reading committed, repeatable reads and absence of phantom reads. And the behaviour you have described does not look as a violation.

To avoid duplicating the records you can simply do

select ... from task_instances for update

Due to this "for update" clause the selected row will be locked for transaction lifetime. So only one transaction will be able to update, and the second one will have to wait until the first one is committed. As a result, the second transaction will read the value updated by the first - that's right a guarantee you need here.

What is also important, if you use "select for update" in such scenario, you do not need even Serializable isolation level, read committed would be enough.

Pegg answered 25/4, 2018 at 12:43 Comment(8)
I think you're right. Serialization error will be raised on update of the same row, not when insert or select are performed.Maldon
@JustMe, what is the serialization error? Normally, another DB client is just blocked until commit of transaction using "for update".Pegg
Yes you're right. I mean "could not serialize access due to concurrent update" like in blog.2ndquadrant.com/… SERIALIZABLE transactions topic.Maldon
@user3714601, this locks only the selected rows, correct? I think this does not help, as the query doesn't update any rows, and a concurrent insert would still be possible.Sverdlovsk
@jstriebel, yes, it locks the selected rows. And it definitely would help, if all the concerned transactions start with such select. One row will be effectively "owned" by one transaction at a time. Am I clear?Pegg
I think the point here is to avoid locks. Also, serializability means more than the absence of phantom reads, it means that there must be an equivalent aerial execution order (which there isn't). In short, I think the query should work as intended. I'll have a look.Connection
@LaurenzAlbe, why do you think locking a row could be bad here? Especially taking that read_committed isolation could be enough in this case.Pegg
I'm not saying locks are bad. I had understand that the question was: Why do I see this misbehavior in my setup?Connection
C
1

I tried your example like this:

Session 1:

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;

WITH task_instances AS (
  SELECT t._id, t.totalInstances - COUNT(assigned._id) openInstances
  FROM tasks t
  LEFT JOIN assigned ON t._id = assigned._task
  GROUP BY t._id, t.totalInstances
),
selected_task AS (
  SELECT _id
  FROM task_instances
  WHERE openInstances > 0
  LIMIT 1
)
INSERT INTO assigned(_id, _task)
SELECT 1, _id
FROM selected_task;

Session 2:

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;

WITH task_instances AS (
  SELECT t._id, t.totalInstances - COUNT(assigned._id) openInstances
  FROM tasks t
  LEFT JOIN assigned ON t._id = assigned._task
  GROUP BY t._id, t.totalInstances
),
selected_task AS (
  SELECT _id
  FROM task_instances
  WHERE openInstances > 0
  LIMIT 1
)
INSERT INTO assigned(_id, _task)
SELECT 2, _id
FROM selected_task;

COMMIT;

Session 1:

COMMIT;

And this is what I get:

ERROR:  could not serialize access due to read/write dependencies among transactions
DETAIL:  Reason code: Canceled on identification as a pivot, during commit attempt.
HINT:  The transaction might succeed if retried.

So it works as expected.

The only explanation that I have is that there is something wrong in your setup, and you are not using SERIALIZABLE after all.

Do you ever see serialization errors in your application? If not, that would confirm my suspicion.

Connection answered 26/4, 2018 at 6:43 Comment(3)
We do see such serialization errors, upon which our application retries the query. I see this correct behavior 10 times in the logs of the last hours, but 2 times it still wrongly assigned the same task twice…Malikamalin
Then my only explanations are a) a bug in your application logic (SERIALIZABLE is not always used) b) other transactions are inserting these rows c) there is a bug in PostgreSQL. The third option is of course possible, but I'd investigate the other two first.Connection
a) We investigated the parallel assignments, the logs were showing SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL SERIALIZABLE; directly before both transactions, in the corresponding sessions. b) Nothing else inserts those rows, we double checked this. c) We first had those issues with 10.2, and upgraded to 10.3, which fixes misbehavior of concurrent-update rechecks with CTE references appearing in subplans (changelog). The upgrade did not fix our issue, we are not sure if this change is related.Sverdlovsk

© 2022 - 2024 — McMap. All rights reserved.