KafkaAvroSerializer for serializing Avro without schema.registry.url
Asked Answered
A

5

30

I'm a noob to Kafka and Avro. So i have been trying to get the Producer/Consumer running. So far i have been able to produce and consume simple Bytes and Strings, using the following : Configuration for the Producer :

    Properties props = new Properties();
    props.put("bootstrap.servers", "localhost:9092");
    props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
    props.put("value.serializer", "org.apache.kafka.common.serialization.ByteArraySerializer");

    Schema.Parser parser = new Schema.Parser();
    Schema schema = parser.parse(USER_SCHEMA);
    Injection<GenericRecord, byte[]> recordInjection = GenericAvroCodecs.toBinary(schema);

    KafkaProducer<String, byte[]> producer = new KafkaProducer<>(props);

    for (int i = 0; i < 1000; i++) {
        GenericData.Record avroRecord = new GenericData.Record(schema);
        avroRecord.put("str1", "Str 1-" + i);
        avroRecord.put("str2", "Str 2-" + i);
        avroRecord.put("int1", i);

        byte[] bytes = recordInjection.apply(avroRecord);

        ProducerRecord<String, byte[]> record = new ProducerRecord<>("mytopic", bytes);
        producer.send(record);
        Thread.sleep(250);
    }
    producer.close();
}

Now this is all well and good, the problem comes when i'm trying to serialize a POJO. So , i was able to get the AvroSchema from the POJO using the utility provided with Avro. Hardcoded the schema, and then tried to create a Generic Record to send through the KafkaProducer the producer is now set up as :

    Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.KafkaAvroSerializer");

Schema.Parser parser = new Schema.Parser();
Schema schema = parser.parse(USER_SCHEMA); // this is the Generated AvroSchema
KafkaProducer<String, byte[]> producer = new KafkaProducer<>(props);

this is where the problem is : the moment i use KafkaAvroSerializer, the producer doesn't come up due to : missing mandatory parameter : schema.registry.url

I read up on why this is required, so that my consumer is able to decipher whatever the producer is sending to me. But isn't the schema already embedded in the AvroMessage? Would be really great if someone can share a working example of using KafkaProducer with the KafkaAvroSerializer without having to specify schema.registry.url

would also really appreciate any insights/resources on the utility of the schema registry.

thanks!

Asinine answered 11/8, 2017 at 12:52 Comment(1)
have you tried spring-kafka avro deserializer ? Here's a tutorial as well.Unthankful
A
51

Note first: KafkaAvroSerializer is not provided in vanilla apache kafka - it is provided by Confluent Platform. (https://www.confluent.io/), as part of its open source components (http://docs.confluent.io/current/platform.html#confluent-schema-registry)

Rapid answer: no, if you use KafkaAvroSerializer, you will need a schema registry. See some samples here: http://docs.confluent.io/current/schema-registry/docs/serializer-formatter.html

The basic idea with schema registry is that each topic will refer to an avro schema (ie, you will only be able to send data coherent with each other. But a schema can have multiple version, so you still need to identify the schema for each record)

We don't want to write the schema for everydata like you imply - often, schema is bigger than your data! That would be a waste of time parsing it everytime when reading, and a waste of ressources (network, disk, cpu)

Instead, a schema registry instance will do a binding avro schema <-> int schemaId and the serializer will then write only this id before the data, after getting it from registry (and caching it for later use).

So inside kafka, your record will be [<id> <bytesavro>] (and magic byte for technical reason), which is an overhead of only 5 bytes (to compare to the size of your schema) And when reading, your consumer will find the corresponding schema to the id, and deserializer avro bytes regarding it. You can find way more in confluent doc

If you really have a use where you want to write the schema for every record, you will need an other serializer (I think writing your own, but it will be easy, just reuse https://github.com/confluentinc/schema-registry/blob/master/avro-serializer/src/main/java/io/confluent/kafka/serializers/AbstractKafkaAvroSerializer.java and remove the schema registry part to replace it with the schema, same for reading). But if you use avro, I would really discourage this - one day a later, you will need to implement something like avro registry to manage versioning

Anzac answered 11/8, 2017 at 13:56 Comment(2)
IMO you can keep backward compatibile schemas in your maven repo and there is no need to keep schema registry for that. You avoid taking care of additional service, since you compile your schema with your code. However, you need to redeploy applications if you change your schema. IMO it's a fair cost.Splitlevel
This is the line where the magic byte and id are writte github.com/confluentinc/schema-registry/blob/…Josefinejoseito
P
2

While the checked answer is all correct, it should also be mentioned that schema registration can be disabled.

Simply set auto.register.schemas to false.

Perot answered 30/9, 2018 at 8:3 Comment(3)
spring.kafka.properties.auto.register.schemas for those using SpringBoot.Spandau
This doesnt help at all, cause it anyway uses the schemaregistry to fetch the schema.Holmium
It only makes matters worse to be honest, because it keeps SR tightly coupled to the client and only doesn't allow the client to take things in their own hands when the schema is missing.Hylotheism
L
2

You can create your Custom Avro serialiser, then even without Schema registry you would be able to produce records to topics. Check below article.

https://codenotfound.com/spring-kafka-apache-avro-serializer-deserializer-example.html

Here they have use Kafkatemplate . I have tried using

KafkaProducer<String, User> UserKafkaProducer

It is working fine But if you want to use KafkaAvroSerialiser, you need to give Schema registryURL

Lavenialaver answered 12/6, 2020 at 17:10 Comment(1)
Avro schema resolution needs both writer and reader schema to perform schema resolution In above example, producer only serialises and sends bytes to Kafka. Is there any example where serializer serializes both payload and schema ?Assassin
A
2

As others have pointed out, KafkaAvroSerializer requires Schema Registry which is part of Confluent platform, and usage requires licensing.

The main advantage of using the schema registry is that your bytes on wire will smaller, as opposed to writing a binary payload with schema for every message.

I wrote a blog post detailing the advantages

Alcock answered 6/7, 2020 at 3:33 Comment(0)
M
1

You can always make your value classes to implement Serialiser<T>, Deserialiser<T> (and Serde<T> for Kafka Streams) manually. Java classes are usually generated from Avro files, so editing that directly isn't a good idea, but wrapping is maybe verbose but possible way.

Another way is to tune Arvo generator templates that are used for Java classes generation and generate implementation of all those interfaces automatically. Both Avro maven and gradle plugins supports custom templates, so it should be easy to configure.

I've created https://github.com/artemyarulin/avro-kafka-deserializable that has changed template files and simple CLI tool that you can use for file generation

Macey answered 7/11, 2018 at 6:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.