Replication lag - exceeding max_slot_wal_keep_size, WAL segments not removed
Asked Answered
C

1

5

Summary

We are using max_slot_wal_keep_size from Postgresql 13 to prevent master from being killed by a lagging replication. It seems, that in our case, WAL storage wasn't freed up after exceeding this parameter which resulted in a replication failure. WAL which, as I believe, should have been freed up did not seem to be needed by any other transaction at a time. I wonder how this should work and why WAL segments were not removed?

Please find the details below.

Configuration

  • master & one replica - streaming replication using a slot
  • ~700GB available for pg_wal
  • max_slot_wal_keep_size = 600GB
  • min_wal_size = 20GB
  • max_wal_size = 40GB
  • default checkpoint_timeout = 5 minutes (no problem with checkpoints)
  • archiving is on and is catching up well

What happened

Under heavy load (large COPY/INSERT transactions, loading hundreds of GB of data), the replication started falling behind. Available space on pg_wal was being reduced in the same rate as safe_slot pg_replication_slot.safe_wal_size - as expected. At some point safe_wal_size went negative and streaming stopped working. It wasn't a problem, because replica started recovery from WAL archive. I expected that once the slot is lost, WALs will be removed up to max_wal_size. This did not happen though. It seems that Postgres tried to maintain something close to max_slot_wal_keep_size (600GB) available, in case replica starts catching up again. Over the time, there was no single transaction which would require this much WAL to be kept. archiving wasn't behind either.

  • Q1: Is it the case that PG will try to maintain max_slot_keep_size of WALs available?
  • Q2: If not, why PG did not remove excessive WAL when they were not needed neither by archiver, nor by any transactions running on the system?

Amount of free space on pg_wal was more or less 70GB for most of the time, however at some point, during heavy autovacuuming, it dipped to 0 :( This is when PG crashed and (auto-recovered soon after). After getting back up, there was 11GB left on pg_wal and no transaction running, no loading. This lasted for hours. During this time replica finally caught up from the archive and restored the replication with no delay. None of the WALs were removed. I manually run checkpoint but it did not clear any WALs. I finally restarted Postgresql and during the restarting pg_wal were finally cleared.

  • Q3: Again - why PG did not clear WAL? WALs, even more clearly, were not needed by any process.

Many thanks!

Cooperative answered 9/7, 2021 at 9:10 Comment(0)
K
7

This was a PostgreSQL bug, and it's fixed. Thanks for reporting!

It should be available in 13.4 according to release notes (look for "Advance oldest required WAL segment")

Kwang answered 16/7, 2021 at 16:15 Comment(1)
Are you able to tell in which version this fix is available?Chesney

© 2022 - 2024 — McMap. All rights reserved.