Avoid exclusive access locks on referenced tables when DROPping in PostgreSQL

Asked 21/8, 2015 at 16:26 Answered 21/1, 2021 at 16:5

Solved postgresql database-deadlocks drop-table

Why does dropping a table in PostgreSQL require ACCESS EXCLUSIVE locks on any referenced tables? How can I reduce this to an ACCESS SHARED lock or no lock at all? i.e. is there a way to drop a relation without locking the referenced table?

I can't find any mention of which locks are required in the documentation, but unless I explicitly get locks in the correct order when dropping multiple tables during concurrent operations, I can see deadlocks waiting on an AccessExclusiveLock in the logs, and acquiring this restrictive lock on commonly-referenced tables is causing momentary delays to other processes when tables are deleted.

To clarify,

CREATE TABLE base (
    id SERIAL,
    PRIMARY KEY (id)
);
CREATE TABLE main (
    id SERIAL,
    base_id INT,
    PRIMARY KEY (id),
    CONSTRAINT fk_main_base (base_id)
        REFERENCES base (id)
        ON DELETE CASCADE ON UPDATE CASCADE
);
DROP TABLE main; -- why does this need to lock base?

Gymnosperm answered 21/8, 2015 at 16:26 Comment(11)

What would you like to happen instead? You are attempting to remove the furniture from a busy restaurant, people are still halfway their meal, what should they pay for? half the meal? the full meal? Only the part that they consumed? – Topminnow 21/8, 2015 at 16:43

@Topminnow I don't understand your analogy; I'm not talking about locks on the table I'm deleting, but on a table which it references. The referenced table shouldn't care that there's a deletion going on - its data won't be changed by the action. It shouldn't need to do anything at all. – Gymnosperm 21/8, 2015 at 16:49

@Topminnow see my example – Gymnosperm 21/8, 2015 at 16:51

I see. It is still not clear to me why you want to drop and (maybe) recreate tables. BTW: acid on catalogs is hard; could be that postgres is overly defensive here. – Topminnow 21/8, 2015 at 21:47

@Topminnow the tables are not being recreated. These tables are used to ensure customer isolation (e.g. project_1_data, etc.). We are migrating to using schemas for this (where we may simply duplicate the shared information), but since we're still in the process of migrating from MySQL, we haven't been able to make that change yet. Tables are deleted when we archive dead projects, but more importantly during integration tests, where we create and destroy projects rapidly to ensure isolation of their testing environment. – Gymnosperm 22/8, 2015 at 8:51

I've been able to reproduce your casus. And after some rethinking, I must conclude that the observed behaviour is correct, intended, and necessary. If the DDL is not yet committed, another session should not be able to violate the constraints. Once the transaction is committed, an other session can do whatever is allowed by the new datamodel. (I do think that allowing multiple versions of the schema could avoid the need for a exclusive lock, but that would be very hard to do right) – Topminnow 22/8, 2015 at 14:33

@Topminnow that would explain why it would need to block writes on the base table (to prevent other threads potentially removing a record and causing the table which is being deleted to be scanned, though I'd expect that could be solved easily enough by immediately invalidating the constraint?), but I still don't understand why it should need a full ACCESS EXCLUSIVE lock. Unless you're saying it doesn't distinguish the type of DDL change, and is just using the logic "this table is changing somehow - better lock the stuff it's connected to"? – Gymnosperm 22/8, 2015 at 19:16

Although, I can't think of any DDL changes which would require an ACCESS EXCLUSIVE lock on a referenced table. What harm could another thread reading data from base do in any scenario, since we're never going to change the table? If another thread wanted to change base's DDL, it would necessarily require an ACCESS EXCLUSIVE lock, and if the current thread had even an ACCESS SHARED lock on base it would be enough to block that. Could you elaborate on why you think the lock is necessary? – Gymnosperm 22/8, 2015 at 19:20

The base table's primary key (Update,Delete) can cascade into main.base_id. But it cannot cascade if main is missing or changed. I'll post my test rig as an answer. (removing cascade does nt seem to change behaviour; removing the FK does (obviously) – Topminnow 22/8, 2015 at 19:29

@Topminnow but isn't ACCESS SHARED enough to guarantee UPDATE and DELETE can't happen? – Gymnosperm 22/8, 2015 at 19:35

No, that is not enough. The query plan for the other session could change, too. So it needs to be invalidated. Or prevented... – Topminnow 23/8, 2015 at 10:45

For anyone googling and trying to understand why their drop table (or drop foreign key or add foreign key) got stuck for a long time:

PostgreSQL (I looked at versions 9.4 to 13) foreign key constraints are actually implemented using triggers on both ends of the foreign key.

If you have a company table (id as primary key) and a bank_account table (id as primary key, company_id as foreign key pointing to company.id), then there are actually 2 triggers on the bank_account table and also 2 triggers on the company table.

table_name	timing	trigger_name	function_name
bank_account	AFTER UPDATE	RI_ConstraintTrigger_c_1515961	RI_FKey_check_upd
bank_account	AFTER INSERT	RI_ConstraintTrigger_c_1515960	RI_FKey_check_ins
company	AFTER UPDATE	RI_ConstraintTrigger_a_1515959	RI_FKey_noaction_upd
company	AFTER DELETE	RI_ConstraintTrigger_a_1515958	RI_FKey_noaction_del

Initial creation of those triggers (when creating the foreing key) requires SHARE ROW EXCLUSIVE lock on those tables (it used to be ACCESS EXCLUSIVE lock in version 9.4 and earlier). This lock does not conflict with "data reading locks", but will conflict with all other locks, for example a simple INSERT/UPDATE/DELETE into company table.

Deletion of those triggers (when droping the foreign key, or the whole table) requires ACCESS EXCLUSIVE lock on those tables. This lock conflicts with every other lock!

So imagine a scenario, where you have a transaction A running that first did a simple SELECT from company table (causing it to hold an ACCESS SHARE lock for company table until the transaction is commited or rolled back) and is now doing some other work for 3 minutes. You try to drop the bank_account table in transaction B. This requires ACCESS EXCLUSIVE lock, which will need to wait until the ACCESS SHARE lock is released first. In addition of that all other transactions, which want to access the company table (just SELECT, or maybe INSERT/UPDATE/DELETE), will be queued to wait on the ACCESS EXCLUSIVE lock, which is waiting on the ACCESS SHARE lock.

Long running transactions and DDL changes require delicate handling.

Plato answered 21/1, 2021 at 16:5 Comment(3)

Please don't add the same answer to multiple questions. Answer the best one and flag the rest as duplicates. See Is it acceptable to add a duplicate answer to several questions? – Poucher 21/1, 2021 at 16:16

It won't let me flag the other one as a duplicate because this one here doesn't have an accepted answer yet. But I deleted my answer from the other question. – Plato 21/1, 2021 at 16:35

now it has an accepted answer ;) – this is great information; thanks for answering a 6 year old question! – Gymnosperm 21/1, 2021 at 22:40

        -- SESSION#1
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;

BEGIN;
CREATE TABLE base (
    id SERIAL
    , dummy INTEGER
    , PRIMARY KEY (id)
);
CREATE TABLE main (
    id SERIAL
    , base_id INTEGER
    , PRIMARY KEY (id)
    , CONSTRAINT fk_main_base FOREIGN KEY (base_id) REFERENCES base (id)
        -- comment the next line out ( plus maybe tghe previous one)
        ON DELETE CASCADE ON UPDATE CASCADE
);
        -- make some data ...
INSERT INTO base (dummy)
SELECT generate_series(1,10)
        ;

        -- make some FK references
INSERT INTO main(base_id)
SELECT id FROM base
WHERE random() < 0.5
        ;
COMMIT;

BEGIN;
DROP TABLE main; -- why does this need to lock base?

SELECT pg_backend_pid();

        -- allow other session to check the locks
        -- and attempt an update to "base"
SELECT pg_sleep(20);

        -- On rollback the other session will fail.
        -- On commit the other session will succeed.
        -- In both cases the other session must wait for us to complete.
-- ROLLBACK;
COMMIT;

        -- SESSION#2
        -- (Start this after session#1 from a different terminal)
SET search_path = tmp, pg_catalog;

PREPARE peeklock(text) AS
SELECT dat.datname
        , rel.relname as relrelname
        , cat.relname as catrelname
        , lck.locktype
        -- , lck.database, lck.relation
        , lck.page, lck.tuple
        -- , lck.virtualxid, lck.transactionid 
        -- , lck.classid
        , lck.objid, lck.objsubid
        -- , lck.virtualtransaction 
        , lck.pid, lck.mode, lck.granted, lck.fastpath

FROM pg_locks lck
LEFT JOIN pg_database dat ON dat.oid = lck.database
LEFT JOIN pg_class rel ON rel.oid = lck.relation
LEFT JOIN pg_class cat ON cat.oid = lck.classid
WHERE EXISTS(
        SELECT * FROM pg_locks l
        JOIN pg_class c ON c.oid = l.relation AND c.relname = $1
        WHERE l.pid =lck.pid
        )
        ;

EXECUTE peeklock( 'base' );
BEGIN;
        -- attempt to perfom some DDL
ALTER TABLE base ALTER COLUMN id TYPE BIGINT;

        -- attempt to perfom some DML
UPDATE base SET id = id+100;

COMMIT;

EXECUTE peeklock( 'base' );

\d base
SELECT * FROM base;

Topminnow answered 22/8, 2015 at 19:40 Comment(2)

"On rollback the other session will fail": I think this is a possibility which was obvious to you but I somehow ignored: the DROP TABLE transaction could roll-back. So that explains why the constraint can't be immediately dropped and forgotten about. I'm still not sure about the lock type though; ALTER TABLE and UPDATE are good examples of where a lock is needed, but I believe both can be avoided by an ACCESS SHARED lock. Is there an example you can think of where ACCESS SHARED wouldn't be enough, or is it just postgres being overly defensive as you initially suggested? – Gymnosperm 22/8, 2015 at 19:58

Rule of thumb: dont mix DDL and DML. (IOW: kick out the other sessions when you are doing DDL. for testing purposes this will not cause problems) And yes: this kind of course locking is overly conservative; with a reason. – Topminnow 23/8, 2015 at 0:1

-1

I suppose DDL locks everything it touches exclusively for the sake of simplicity — you're not supposed to run DDL involving not-temporary tables during normal operation anyway.

To avoid deadlock you may use advisory lock:

start transaction;
select pg_advisory_xact_lock(0);
drop table main;
commit;

This would ensure that only one client is concurrently running DDL involving referenced tables so it wouldn't matter in which order would other locks be acquired.

You can avoid locking table for long time by dropping foreign key first:

start transaction;
select pg_advisory_xact_lock(0);
alter table main drop constraint fk_main_base;
commit;
start transaction;
drop table main;
commit;

This would still need to lock base exclusively, but for much shorter time.

Phratry answered 22/8, 2015 at 20:46 Comment(1)

How much shorter? Is there some limit to X seconds? Or just 10% shorter? I ask because knowing how long makes the difference between being able to do it online with a large base vs not. – Somatic 13/9, 2017 at 15:57

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags