How to limit WAL size when using Postgres Logical Replication Slot?
Asked Answered
P

3

5

I am creating replication slot and streaming changes from AWS Postgres RDS to java process through JDBC driver.

My replication slot creation code looks like this.

final ReplicationSlotInfo replicationSlotInfo = pgConnection.getReplicationAPI()
                    .createReplicationSlot()
                    .logical()
                    .withSlotName(replicationSlotName)
                    .withOutputPlugin("wal2json")
                    .make();

and I get replication stream using following code.

pgConnection.getReplicationAPI()
                .replicationStream()
                .logical()
                .withSlotName(replicationSlotName)
                .withSlotOption("include-xids", true)
                .withSlotOption("include-timestamp", true)
                .withSlotOption("pretty-print", false)
                .withSlotOption("add-tables", "public.users")
                .withStatusInterval(10, TimeUnit.SECONDS)
                .start()

When replicator java process is not running, the WAL size gets increased. Here is the query I use to find replication lag.

SELECT
    slot_name,
    pg_size_pretty(pg_xlog_location_diff(pg_current_xlog_location(), restart_lsn)) AS replicationSlotLag,
    active
FROM
    pg_replication_slots;

Output:

slot_name   replicationslotlag  active
data_stream_slot    100 GB  f

This replication lag gets increased beyond RDS Disk, which shuts RDS down.

I thought wal_keep_segments will take care of this, which was set to 32. But it did not work. Is there any other property which I have to set to avoid this situation, even when Java Replication process is not running.

Pringle answered 4/3, 2020 at 13:28 Comment(1)
In Postgres 13 you can now limit the size of WAL retained for replication slotsHorticulture
M
5

There is a proposal to allow a logical replication slots WAL retention to be limited. I think that that is just what you need, but it is not clear when/if it will become available.

In the meantime, all you can do is monitor the situation, then then drop the slot if it starts to fall behind too far. Of course this does mean you will have a problem re-establishing synchronization later, but there is no way around that (other than fixing whatever it is that is causing the replication process to go away and/or fall behind).

Since you say the java process is not running, dropping the slot is easy to do. If it were running, but just not keeping up, then you would have to do the sad little dance where you kill the wal sender, then try to drop the slot before it gets restarted (and I don't know how you do that on RDS)

wal_keep_segments is only applicable to physical replication, not logical. And it is for use instead of slots, not in addition to them. If you have both, then WAL is retained until both criteria are met. Indeed that is the problem you are facing; logical replication cannot be done without use of slots the way physical replication can.

Moonshiner answered 4/3, 2020 at 15:57 Comment(2)
The proposed change made it into Postgres 13Horticulture
In Postgres 13 - max_slot_wal_keep_sizeCreative
M
3

wal_keep_segments is irrelevant for logical decoding.

With logical decoding, you always have to use a logical replication slot, which is a data structure which marks a position in the transaction log (WAL), so that the server never discards old WAL segments that logical decoding might still need.

That is why your WAL directory grows if you don't consume the changes.

wal_keep_segments specifies a minimum number of old WAL segments to retain. It is used for purposes like streaming replication, pg_receivewal or pg_rewind.

Mutable answered 4/3, 2020 at 13:34 Comment(2)
If wal_keep_segments is irrelevant, do you know how to limit the WAL when using replication slot. I think there should be a simple property to limit the size of WAL. I don't care if the old WAL is deleted by Postgres, when replication process is down.Pringle
You cannot limit the amount of WAL when using a replication slot. You have to monitor the consumer and make sure it doesn't fall behind.Mutable
S
0

wal_keep_segments specifies the minimum number of segments PostgreSQL should keep in pg_xlog directory. There can be a few reasons why PostgreSQL doesn't remove segments:

  1. There is a replication slot at a WAL location older than the WAL files, you can check it with this query:
SELECT slot_name,
       lpad((pg_control_checkpoint()).timeline_id::text, 8, '0') ||
       lpad(split_part(restart_lsn::text, '/', 1), 8, '0') ||
       lpad(substr(split_part(restart_lsn::text, '/', 2), 1, 2), 8, '0')
       AS wal_file
FROM pg_replication_slots;
  1. WAL archiving is enabled and archive_command fails. Please, check PostgreSQL logs in this case.

  2. There was no checkpoint for a long time.

Scornful answered 4/3, 2020 at 13:36 Comment(1)
How to interpret the output from the above query?Natiha

© 2022 - 2024 — McMap. All rights reserved.