I see Kafka Connect can write to S3 in Avro or JSON formats. But there is no Parquet support. How hard would this be to add?
Parquet Output From Kafka Connect to S3
Asked Answered
Added! See: twitter.com/karantasis/status/1181701302608285698?s=19 and: github.com/confluentinc/kafka-connect-storage-cloud/pull/241 –
Glossographer
Parquet support is now available as part of the 5.4 release of Kafka Connect S3 sink –
Algonquian
Yes you can! I wrote an example here –
Syncretize
Starting with Confluent 5.4.0, there is official support for Parquet output to S3.
The Qubole connector supports writing out parquet - https://github.com/qubole/streamx
Try secor
:
https://github.com/pinterest/secor
Can work with AWS S3, google cloud, Azure's blob storage etc.
Note that the solution you choose must have key features like: Guarantee writing each message exactly once, load distribution, fault tolerance, monitoring, partitioning data etc.
Secor
has it all and as stated above, can easily work with other "s3" style services..
© 2022 - 2025 — McMap. All rights reserved.