Storing Firehose transfered files in S3 under custom directory names
Asked Answered
B

3

10

We primarily do bulk transfer of incoming click stream data through Kinesis Firehose service. Our system is a multi tenant SaaS platform. The incoming click stream data are stored S3 through Firehose. By default, all the files are stored under directories named per given date-format. I would like to specify the directory path for the data files in Firehose planel \ through API in order to segregate the customer data.

For example, the directory structure that I would like to have in S3 for customers A, B and C :

/A/2017/10/12/

/B/2017/10/12/

/C/2017/10/12/

How can I do it?

Benniebenning answered 18/10, 2017 at 5:48 Comment(0)
H
2

AWS Firehose supports the dynamic partitioning .

It can be done in two ways either with inline JQ parser or lambda function.

Example:

"ExtendedS3DestinationConfiguration": {  
"BucketARN": "arn:aws:s3:::my-logs-prod",  
"Prefix": "customer_id=!{partitionKeyFromQuery:customer_id}/ 
    device=!{partitionKeyFromQuery:device}/ 
    year=!{partitionKeyFromQuery:year}/  
    month=!{partitionKeyFromQuery:month}/  
    day=!{partitionKeyFromQuery:day}/  
    hour=!{partitionKeyFromQuery:hour}/"  
} 
Hang answered 25/4, 2022 at 14:58 Comment(0)
P
0

You can separate your directories by configuring the S3 Prefix. In the console, this is done during setup when you set the S3 bucket name.

Using the CPI, you set the prefix in the --s3-destination-configuration as shown here:

http://docs.aws.amazon.com/cli/latest/reference/firehose/create-delivery-stream.html

Note however, you can only set one prefix per Firehose Delivery Stream, so if you're passing all of your clickstream data through one Firehose Delivery Stream you will not be able to send the records to different prefixes.

Pled answered 19/10, 2017 at 13:22 Comment(2)
I would like to use the "prefix" as a variable and configure it based on the incoming click data [e.g click.customer_id]. However, thanks for your answer.Benniebenning
Understood, but if you're using a single Firehose Delivery Stream, you will not be able to write to different prefixes, even with a variable. As it stands now, there's no way to pass your variable through to the S3 prefix configuration on the Delivery Stream. If you want separate prefixes in one bucket, you'll have to use multiple Delivery Streams reading from the same Kinesis stream, and a record transformation Lambda to filter for a given prefix configured with the Delivery Stream.Pled
O
-5

Custom prefixes are now supported.

Onlybegotten answered 4/2, 2020 at 16:0 Comment(2)
This doesn't seem to be a solution in that it doesn't provide a way to create folder(s) based on the event values (e.g. customer ID)Lexie
It may not be an answer to the question, but it's good to be aware of. "Custom prefixes" is a little misleading on AWS's part though, as in reality it's just a limited set of timestamp-based or random string-based values you can choose from.Indue

© 2022 - 2024 — McMap. All rights reserved.