I am receiving Avro records from Kafka. I want to convert these records into Parquet files. I am following this blog post: http://blog.cloudera.com/blog/2014/05/how-to-convert-existing-data-into-parquet/
The code so far looks roughly like this:
final String fileName
SinkRecord record,
final AvroData avroData
final Schema avroSchema = avroData.fromConnectSchema(record.valueSchema());
CompressionCodecName compressionCodecName = CompressionCodecName.SNAPPY;
int blockSize = 256 * 1024 * 1024;
int pageSize = 64 * 1024;
Path path = new Path(fileName);
writer = new AvroParquetWriter<>(path, avroSchema, compressionCodecName, blockSize, pageSize);
Now, this will do the Avro to Parquet conversion, but it will write the Parquet file to the disk. I was wondering if there was an easier way to just keep the file in memory so that I don't have to manage temp files on the disk. Thank you