Kafka Connect HDFS Sink for JSON format using JsonConverter
Asked Answered
C

2

1

Produce to/Consume from Kafka in JSON. Save to HDFS in JSON using below properties :

key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false

Producer :

curl -X POST -H "Content-Type: application/vnd.kafka.json.v1+json" \
      --data '{"schema": {"type": "boolean", "optional": false, "name": "bool", "version": 2, "doc": "the documentation", "parameters": {"foo": "bar" }}, "payload": true }' "http://localhost:8082/topics/test_hdfs_json"

Consumer :

./bin/connect-standalone etc/schema-registry/connect-avro-standalone.properties etc/kafka-connect-hdfs/quickstart-hdfs.properties

Issue-1:

key.converter.schemas.enable=true

value.converter.schemas.enable=true

Getting Exception:

org.apache.kafka.connect.errors.DataException: JsonDeserializer with schemas.enable requires "schema" and "payload" fields and may not contain additional fields
    at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:332)

Issue-2:

Enabling above two properties is not throwing any issue, but no data are written over hdfs.

Any suggestion will be highly appreciated.

Thanks

Candytuft answered 21/11, 2016 at 11:57 Comment(0)
B
2

The converter refers to how the data will be translated from the Kafka topic to be interpreted by the connector and written to HDFS. The HDFS connector only supports writing to HDFS in avro or parquet out of the box. You can find the information on how to extend the format to JSON here. If you make such an extension I encourage you to contribute it to the open source project for the connector.

Burthen answered 24/11, 2016 at 2:33 Comment(3)
Thanks for your suggestion!Candytuft
@Burthen Do you know if such an extension is achievable using native kafka connect api?Liddy
There is a JsonConverter that already ships with Kafka. I think the question here is specific to an output format for the HDFS connector, which necessarily means extending the connector, not doing anything natively with Connect itself if I have understood your question properly.Burthen
T
1

For input Json format messages to be written into HDFS, please set below properties

key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
Tugman answered 11/7, 2017 at 7:57 Comment(1)
Will check Akshat. Thanks for your commentCandytuft

© 2022 - 2024 — McMap. All rights reserved.