Can Kinesis Firehose receive content uncompressed from CloudWatch Logs subscription?
Asked Answered
D

2

5

I'm using Kinesis Firehose to copy application logs from CloudWatch Logs into S3 buckets.

  1. Application logs are written to CloudWatch
  2. A Kinesis subscription on the log group pulls the log events into a Kinesis stream.
  3. A firehose delivery stream uses a Lambda function to decompress and transform the source record.
  4. Firehose writes the transformed record to an S3 destination with GZIP compression enabled.

However, there is a problem with this flow. Often I've noticed that the Lambda transform function fails because the output data exceeds the 6 MiB response payload limit for Lambda synchronous invocation. It makes sense this would happen because the input is compressed but the output is not compressed. Doing it this way seems like the only way to get the file extension and MIME type set correctly on the resultant object in S3.

Is there any way to deliver the input to the Lambda transform function uncompressed?

This would align the input/output sizes. I've already tried reducing the buffer size on the Firehose delivery stream, but the buffer size limit seems to be on compressed data, not raw data.

Deka answered 19/8, 2019 at 21:3 Comment(0)
D
5

No, it doesn't seem possible to change whether the input from CloudWatch Logs is compressed. CloudWatch Logs will always push GZIP-compressed payloads onto the Kinesis stream.

For confirmation, take a look at the AWS reference implementation kinesis-firehose-cloudwatch-logs-processor of the newline handler for CloudWatch Logs. This handler accepts GZIP-compressed input and returns the decompressed message as output. In order to work around the 6 MiB limit and avoid body size is too long error messages, the reference handler slices the input into two parts: payloads that fit within the 6 MiB limit, and the remainder. The remainder is re-inserted into Kinesis using PutRecordBatch.

Deka answered 19/8, 2019 at 21:25 Comment(0)
W
3

CloudWatch Logs always delivery in compressed format which is a benefit from a cost and performance perspective. But I understand your frustration from not having file extension correct in the S3.

What you could do: 1) Have your lambda uncompress on read and compress on write.
2) Create an S3 event trigger on ObjectCreate that renames the file with the correct extension. Due to the way the firehose writes to S3 you can not use a suffix filter so you're lambda will need to check if it did the rename already.

lambda logic

if object does not end .gz
then
   aws s3 mv object object.gz
end if
Wethington answered 19/8, 2019 at 21:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.