Parquet Output From Kafka Connect to S3
Asked Answered
U

3

7

I see Kafka Connect can write to S3 in Avro or JSON formats. But there is no Parquet support. How hard would this be to add?

Unpeople answered 9/5, 2017 at 19:49 Comment(3)
Added! See: twitter.com/karantasis/status/1181701302608285698?s=19 and: github.com/confluentinc/kafka-connect-storage-cloud/pull/241Glossographer
Parquet support is now available as part of the 5.4 release of Kafka Connect S3 sinkAlgonquian
Yes you can! I wrote an example hereSyncretize
U
5

Starting with Confluent 5.4.0, there is official support for Parquet output to S3.

Unpeople answered 23/1, 2020 at 15:16 Comment(0)
H
2

The Qubole connector supports writing out parquet - https://github.com/qubole/streamx

Hurdle answered 11/5, 2017 at 13:33 Comment(0)
G
1

Try secor: https://github.com/pinterest/secor

Can work with AWS S3, google cloud, Azure's blob storage etc.

Note that the solution you choose must have key features like: Guarantee writing each message exactly once, load distribution, fault tolerance, monitoring, partitioning data etc.

Secor has it all and as stated above, can easily work with other "s3" style services..

Glossographer answered 23/9, 2019 at 17:26 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.