Write to a specific folder in S3 bucket using AWS Kinesis Firehose
Asked Answered
B

3

9

I would like to be able to send data sent to kinesis firehose based on the content inside the data. For example if I sent this JSON data:

{
   "name": "John",
   "id": 345
}

I would like to filter the data based on id and send it to a subfolder of my s3 bucket like: S3://myS3Bucket/345_2018_03_05. Is this at all possible with Kinesis Firehose or AWS Lambda?

The only way I can think of right now is to resort to creating a kinesis stream for every single one of my possible IDs and point them to the same bucket and then send my events to those streams in my application, but I would like to avoid that since there are many possible IDs.

Basutoland answered 14/5, 2018 at 23:55 Comment(1)
did you found the solution for this . I am looking for same scenario .Jag
A
5

You probably want to use an S3 event notification that gets fired each time Firehose places a new file in your S3 bucket (a PUT); the S3 event notification should call a custom lambda function that you write that reads the contents of the S3 file and splits it up and writes it out to the separate buckets, keeping in mind that each S3 file is likely going to contain many records, not just one.

https://aws.amazon.com/blogs/aws/s3-event-notification/

Adao answered 15/5, 2018 at 0:40 Comment(0)
T
2

This is not possible out-of-the box, but here's some ideas...

You can write a Data Transformation in Lambda that is triggered by Amazon Kinesis Firehose for every record. You could code Lambda to save to save the data to a specific file in S3, rather than having Firehose do it. However, you'd miss-out on the record aggregation features of Firehose.

You could use Amazon Kinesis Analytics to look at the record and send the data to a different output stream based on the content. For example, you could have a separate Firehose stream per delivery channel, with Kinesis Analytics queries choosing the destination.

Tresa answered 15/5, 2018 at 0:48 Comment(0)
E
0

If you use a lambda to save the data you would end up with duplicate data onto s3. One stored by lambda and the other stored by firehose since transformation lambda will add the data back to firehose. Unless there is a way to avoid transformed data from lambda being re-added to the stream. I am not aware of a way to avoid that

Escudo answered 6/8, 2019 at 15:39 Comment(1)
Just read this post which says you can mark the result as Dropped and the stream will not write it and you could avoid duplication - reddit.com/r/aws/comments/7a3vfb/…Escudo

© 2022 - 2024 — McMap. All rights reserved.