The story
I have ECS tasks that run docker containers that produce stdout/stderr output. The tasks are configured to use the awslogs driver to send the output to CloudWatch. There is a subscription filter on the CW log group, the subscriber is a Firehose stream that moves logs to an S3 bucket. The stream has an AWS Lambda attached to process CW events in batches. The lambda parses log events and sends the parsed data into another system for indexing. I want to preserve the order of the parsed data but don't know how to achieve this.
My first approach was to include CW event timestamp values into the parsed data and then sort over it in the target system. It proved to be not enough as there might be a number of consequent CW events (in the same log stream) with the same timestamp - CW timestamp values are millisecond based by default.
During the processing batches of CW events in the lambda, their order inside the batch is known and my second approach was to enrich timestamps in the parsed data with order numbers - so events with the same timestamp would have different order numbers. This solution quickly showed its weakness - there might be several instances of the processing lambda that work in parallel on different log event batches from the Firehose stream. One shard of the stream - one instance of the processing lambda. So it is not possible to have a simple counter to preserve log events order across several lambdas executing in parallel.
Next thing that I found out is that CW log event IDs are unique, number based and increasing values. I haven't find any confirmation of that fact, so it's just an observation of the CW UI behavior in the AWS web console. CW API even uses the IDs as backward and forward tokens so IDs should be comparable entities.
The questions
- Can I use the IDs for sorting in the external system? I'm afraid that such increasing nature of the log IDs is just an internal implementation of the CW API and it may change in the future.
- Can I somehow configure the awslogs driver in ECS tasks to include microseconds in the CloudWatch timestamps (seems such precision should be enough for my purposes)? Didn't find it in its documentation