how to enforce schema validation in kafka
Asked Answered
A

2

10

I'm using Kafka & Schema-registry. Defined a schema, using confluent's KafkaAvroSerializer on the producer side. Everything works fine.

On the other hand, if a producer publishes the event without adhering to the schema, it gets published without any problem.

Understood that Kafka gets just serialized binary, doesn't inspect the data and functionality works as designed.

Wondering if there is any better way to enforce stronger schema validation so that the topic is not polluted with bad data ?

Anarchic answered 5/9, 2019 at 0:11 Comment(1)
hi @Anarchic - Did you get a solution to this problem? I am also facing the same issue with both Avro and Json schemas. If you have found a solve, can you please share it?Repository
R
6

Use the following "--config" tags while creating a topic. These enforce message schema validation on value and key respectively. These two rules are independent of each other; you can enable either or both. Non compliant messages will cause client to receive an error from the kafka broker.

https://www.confluent.io/blog/data-governance-with-schema-validation/

confluent.value.schema.validation=true 
confluent.key.schema.validation=true
Repository answered 20/9, 2020 at 8:37 Comment(0)
B
0

If you are using Schema Registry and KafkaAvroSerializer, it is working as expected. The important aspect for Schema registry is supporting schema evolution where schema can change over time. Each event will have an embedded schema ID in Wire format which will allow to deserialize the events on the consumer side.

But in case, if you want to have strict schema validation before writing to Kafka topic, there are two options-

  1. You can define the schema in your application and use SpecificRecord type
  2. Fetch the schema from Schema registry subject using APIs and validate against that in before writing to topic.
Bowe answered 5/9, 2019 at 14:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.