I have topics in Kafka that are stored in Avro format. I would like to consume the entire topic (which at time of receipt will not change any messages) and convert it into Parquet, saving directly on S3.
I currently do this but it requires me consuming the messages from Kafka one a time and processing on a local machine, converting them to a parquet file, and once the entire topic is consumed and the parquet file completely written, close the writing process and then initiate an S3 multipart file upload. Or | Avro in Kafka -> convert to parquet on local -> copy file to S3 |
for short.
What I'd like to do instead is | Avro in Kafka -> parquet in S3 |
One of the caveats is that the Kafka topic name isn't static, and needs to be fed in an argument, used once, and then never used again.
I've looked into Alpakka and it seems like it might be possible - but it's unclear, I haven't seen any examples. Any suggestions?