I am trying to use the TimeBasedPartitioner of the Confluent S3 sink. Here is my config:
{
"name":"s3-sink",
"config":{
"connector.class":"io.confluent.connect.s3.S3SinkConnector",
"tasks.max":"1",
"file":"test.sink.txt",
"topics":"xxxxx",
"s3.region":"yyyyyy",
"s3.bucket.name":"zzzzzzz",
"s3.part.size":"5242880",
"flush.size":"1000",
"storage.class":"io.confluent.connect.s3.storage.S3Storage",
"format.class":"io.confluent.connect.s3.format.avro.AvroFormat",
"schema.generator.class":"io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator",
"partitioner.class":"io.confluent.connect.storage.partitioner.TimeBasedPartitioner",
"timestamp.extractor":"Record",
"timestamp.field":"local_timestamp",
"path.format":"YYYY-MM-dd-HH",
"partition.duration.ms":"3600000",
"schema.compatibility":"NONE"
}
}
The data is binary and I use an avro scheme for it. I would want to use the actual record field "local_timestamp" which is a UNIX timestamp to partition the data, say into hourly files.
I start the connector with the usual REST API call
curl -X POST -H "Content-Type: application/json" --data @s3-config.json http://localhost:8083/connectors
Unfortunately the data is not partitioned as I wish. I also tried to remove the flush size because this might interfere. But then I got the error
{"error_code":400,"message":"Connector configuration is invalid and contains the following 1 error(s):\nMissing required configuration \"flush.size\" which has no default value.\nYou can also find the above list of errors at the endpoint `/{connectorType}/config/validate`"}%
Any idea how to properly set the TimeBasedPartioner? I could not find a working example.
Also how can one debug such a problem or gain further insight what the connector is actually doing?
Greatly appreciate any help or further suggestions.
PartitionException: Error encoding partition
which probably means (per the code that you linked above) my messages (Avro) were not deserialized correctly. Did you use schema registry? – Quagmire