For AvroProducer to Kafka, where are avro schema for "key" and "value"?

Asked 28/4, 2017 at 0:55 Answered 8/5, 2017 at 19:7

apache-kafka kafka-producer-api kafka-python

From the AvroProducer example in the confluent-kafka-python repo, it appears that the key/value schema are loaded from files. That is, from this code:

from confluent_kafka import avro 
from confluent_kafka.avro import AvroProducer

value_schema = avro.load('ValueSchema.avsc')
key_schema = avro.load('KeySchema.avsc')
value = {"name": "Value"}
key = {"name": "Key"}

avroProducer = AvroProducer({'bootstrap.servers': 'mybroker,mybroker2', 'schema.registry.url': 'http://schem_registry_host:port'}, default_key_schema=key_schema, default_value_schema=value_schema)
avroProducer.produce(topic='my_topic', value=value, key=key)

it appears that the files ValueSchema.avsc and KeySchema.avsc are loaded independently of the Avro Schema Registry.

Is this right? What's the point of referencing the URL for the Avro Schema Registry, but then loading schema from disk for key/value's?

Please clarify.

Habakkuk answered 28/4, 2017 at 0:55 Comment(1)

Hey. I was making the same question as you. Why do we still need to pass the Schema Registry URL? – Archaeology 1/6, 2021 at 15:50

I ran into the same issue where it was initially unclear what the point of the local files are. As mentioned by the other answers, for the first write to an Avro topic, or an update to the topic's schema, you need the schema string - you can see this from the Kafka REST documentation here.

Once you have the schema in the registry, you can read it with REST (I used the requests Python module in this case) and use the avro.loads() method to get it. I found this useful because the produce() function requires that you have a value schema for your AvroProducer, and this code will work without that local file being present:

get_schema_req_data = requests.get("http://1.2.3.4:8081/subjects/sample_value_schema/versions/latest")
get_schema_req_data.raise_for_status()
schema_string = get_schema_req_data.json()['schema']
value_schema = avro.loads(schema_string)
avroProducer = AvroProducer({'bootstrap.servers': '1.2.3.4:9092', 'schema.registry.url': 'http://1.2.3.4:8081'}, default_value_schema=value_schema)
avroProducer.produce(topic='my_topic', value={"data" : "that matches your schema" })

Hope this helps.

Circumfuse answered 8/5, 2017 at 19:7 Comment(0)

That is just one way to create a key and value schema in the Schema Registry in the first place. You can create it in the SR first using the SR REST API or you can create new schemas or new versions of existing schemas in the SR by publishing them with new messages. It's entirely your choice which method is preferred.

Wootan answered 29/4, 2017 at 2:46 Comment(0)

Take a look at the code and consider that schema from the registry is needed by a consumer rather than a producer. MessageSerializer registers schema in the schema registry for you :)

Pinder answered 28/4, 2017 at 16:11 Comment(3)

Are you saying we keep the schema in two places: in the registry AND on disk? That seems redundant. – Habakkuk 28/4, 2017 at 17:35

As well, why do we reference the registry in the AvroProducer constructor, but then also reference the schema loaded from disk? Something is not right here... – Habakkuk 28/4, 2017 at 17:36

Schema in the registry is for consumers, schema on disk is for the producer. Note that assuming you don't have schema on disk, if you don't set any compatibility policy in the registry, and accidentally post a change to schema for your subject, there is a possibility that producer won't be able do encode a message. Because of difference between schema in registry and corresponding object in producer's code. – Pinder 30/4, 2017 at 10:40

Recommended topics

Hot tags