Postgres Debezium does not publish the previous state of a record
Asked Answered
B

2

7

I successfully installed Postgres Debezium CDC. Now, I'm able to catch all changes happening to the database. But the problem is "before" field always stays empty. So, if I insert a record (id = 1, name = Bill) I then get from Kafka this data:

'payload': {'before': None, 'after': {'id': 1, 'name': 'Bill'}, ...

But if I update the record like so:

UPDATE mytable set name = 'Bob' WHERE id = 1

I get this from Kafka:

'payload': {'before': None, 'after': {'id': 1, 'name': 'Bob'}, ...

This is how I configured my connector:

curl -X POST  localhost:8083/connectors/ \
  -H "Accept:application/json" -H "Content-Type:application/json" -d \
'{
    "name": "test-connector",
    "config": {
         "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
         "tasks.max": "1",
         "plugin.name": "pgoutput",
         "database.hostname": "postgres",
         "database.port": "5432",
         "database.user": "postgres",
         "database.password": "postgres",
         "database.dbname" : "test",
         "database.server.name": "postgres",
         "database.whitelist": "public.mytable",
         "database.history.kafka.bootstrap.servers": "kafka:9092",
         "database.history.kafka.topic": "public.topic"
    }
}'

What is wrong with that and how can I fix it?

Bacitracin answered 18/1, 2020 at 10:13 Comment(9)
How do you get messages from Kafka?Laywoman
@Iskuskov Alexander. I tried consumers in two languages - PHP and Python. They both get messages from Kafka and print their body and in both cases I see that before property is None (in Python) or null (in PHP).Bacitracin
Did you consume from the earliest offset? Are you sure that message 'payload': {'before': None, 'after': {'id': 1, 'name': 'Bob'} is about update event and not about the insert event that you read again? Please show more data in messagesLaywoman
@Iskuskov Alexander. I'm consuming from the start. In PHP for example, I do it like so $topic->consumeStart(0, RD_KAFKA_OFFSET_BEGINNING);. And I'm totally sure, that the message with name: Bob is an update event, because it pops up, when I manually execute UPDATE statement.Bacitracin
Beside, id field is serial, so it's impossible to have two records with id = 1Bacitracin
Hm. I noticed this in connector logs: REPLICA IDENTITY for 'registry.schema_version' is 'DEFAULT'; UPDATE and DELETE events will contain previous values only for PK columns. It seems like this is the whole problem, but I'm not sure how to fix it.Bacitracin
Try to execute this command: ALTER TABLE public.mytable REPLICA IDENTITY FULL;Laywoman
@Iskuskov Alexander. Yeah, it works!Bacitracin
It's great! I've moved my comments to the answerLaywoman
L
13

before is an optional field that if present contains the state of the row before the event occurred. Whether or not this field is available is highly dependent on the REPLICA IDENTITY setting for each table.

REPLICA IDENTITY is a PostgreSQL specific table-level setting which determines the amount of information that is available to logical decoding in case of UPDATE and DELETE events.

To show the previous values of all the table’s columns, please set the REPLICA IDENTITY level to FULL:

ALTER TABLE public.mytable REPLICA IDENTITY FULL;

See more details in the Debezium docs.

Laywoman answered 20/1, 2020 at 9:22 Comment(0)
A
0

To add to Iskuskov's answer: Adding data to the REPLICA IDENTITY impacts database performance. If you don't need all columns, you can also set the REPLICA IDENTITY to a specific index to only use one or a few columns.

note: Seems like you always need the primary key in that index for Debezium to work

Relevant Postgres docs

Achromic answered 28/8, 2024 at 9:5 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.