Debezium postgres incremental snapshot performance issues
Asked Answered
N

1

7

I am trying to use debezium incremental snapshots in the latest debezium (1.7) and postgres (V13). For testing, I populated a table with 1M rows, each row is 4KB with a UUID primary key and 20 varchar columns. Since I just wanted to measure snapshot performance, The table data does not change for the duration of the test

It seems that incremental snapshot is an order of magnitude slower than regular snapshots. For example, in my testing, I observed speeds of 10,000 change events per second with vanilla snapshot. Whereas, I observed speed of 500 change events per second with incremental snapshots.

I tried increasing the incremental.snapshot.chunk.size to 10,000 but I didn't see much effect on the performance.

I just wanted to confirm whether this is a known/expected issue or am I doing something wrong?

Thanks

Northwestward answered 28/10, 2021 at 19:2 Comment(0)
Q
-1

Batch snapshots are running a consistent (blocking) snapshot, which will allow for maximum throughput, while also locking the source table. From what I understand, this also means that Debezium will not process more records from the WAL files until the snapshot is completed, causing the WAL files to grow, and increase latency of data in your sink. It's basically stopping everything until the batch download of the table is complete, which can be a problem if the table is quite large (and/or you have multiple tables being snapshot).

Incremental snapshots pull data in smaller batches (default 1024 rows), allowing the source tables to be locked for a much shorter amount of time. This also allows Debezium to continue processing data from the WAL files, keeping their sizes lower, with lower latency. As this process is much more complex than Batch, it's going to be significantly slower. It uses logic in the DDD-3 design document to process this data while new data continues to stream in.

I hope this helps.

Debezium documentation

Quantize answered 21/2 at 18:42 Comment(1)
It is not clear from the answer why it is a given that performance should drop significantly in the incremental snapshotNorthwestward

© 2022 - 2024 — McMap. All rights reserved.