I setup kafka connect s3 sink, duration set to 1 hour, and also I setup a rather big flush count, say 10,000. Now if there is not many message in the kafka channel, s3 sink will try to buffer them in memory, and wait it to accumulate to the flush count, then upload them together and commit the offset to its own consumer group.
But think of this situation. If in the channel, I only send 5,000 messages. Then there is no s3 sink flush. Then after a long time, the 5,000 message will eventually be evicted from kafka because of the retention time. But these messages are still in s3 sink's memory, not in s3. This is very dangerous, for example, if we restarted s3 sink or the machine running s3 sink just crashes. Then we lost those 5,000 messages. We cannot find them again from kafka because it is already deleted.
Will this happen to s3 sink? Or there is some settings that force it to flush after sometime?