Amazon Kinesis Firehose Buffering to S3
Asked Answered
P

3

6

I'm attempting to price out a streaming data / analytic application deployed to AWS and looking at using Kinesis Firehose to dump the data into S3.

My question is, when pricing out the S3 costs for this, I need to figure out out how many PUT's I will need.

So, I know the Firehose buffers the data and then flushes out to S3, however I'm unclear on whether it will write a single "file" with all of the records accumulated up to that point or if it will write each record individually.

So, assuming I set the buffer size / interval to an optimal amount based on size of records, does the number of S3 PUT's still equal the number of records OR the number of flushes that the Firehose performs?

Psalmody answered 3/11, 2015 at 16:17 Comment(0)
O
4

Having read a substantial amount of AWS documentation, I respectfully disagree with the assertion that S3 will not charge you.

You will be billed separately for charges associated with Amazon S3 and Amazon Redshift usage including storage and read/write requests. However, you will not be billed for data transfer charges for the data that Amazon Kinesis Firehose loads into Amazon S3 and Amazon Redshift. For further details, see Amazon S3 pricing and Amazon Redshift pricing. [emphasis mine]

https://aws.amazon.com/kinesis/firehose/pricing/

What they are saying you will not be charged is anything additional by Kinesis Firehose for the transfers, other than the $0.035/GB, but you'll pay for the interactions with your bucket. (Data inbound to a bucket is always free of actual per-gigabyte transfer charges).

In the final analysis, though, you appear to be in control of the rough number of PUT requests against your bucket, based on some tunable parameters:

Q: What is buffer size and buffer interval?

Amazon Kinesis Firehose buffers incoming streaming data to a certain size or for a certain period of time before delivering it to destinations. You can configure buffer size and buffer interval while creating your delivery stream. Buffer size is in MBs and ranges from 1MB to 128MB. Buffer interval is in seconds and ranges from 60 seconds to 900 seconds.

https://aws.amazon.com/kinesis/firehose/faqs/#creating-delivery-streams

Unless it is collecting and aggregating the records into large files, I don't see why there would be a point in the buffer size and buffer interval... however, without firing up the service and taking it for a spin, I can (unfortunately) only really speculate.

Ophiology answered 3/11, 2015 at 21:59 Comment(2)
Well, it wouldn't be the first time Kinesis did (or didn't do) something when I expected the exact opposite, but I would agree...if it doesn't aggregate the records, what would be the point? I'm hoping someone can confirm for certain though...Psalmody
Just to follow up to this answer, I got an answer from AWS on the forums. forums.aws.amazon.com/thread.jspa?threadID=219275&tstart=0. You are correct, firehose will write consolidated chunks to S3 so we can control the number of PUT's, however for interpreting the statement, it actually says there won't be any TRANSFER charges (i.e. Between regions, etc). Not referring to service based charges such as PUT. Thanks!Psalmody
P
1

I don't believe you pay anything extra for the write operation to S3 from Firehose.

You will be billed separately for charges associated with Amazon S3 and Amazon Redshift usage including storage and read/write requests. However, you will not be billed for data transfer charges for the data that Amazon Kinesis Firehose loads into Amazon S3 and Amazon Redshift. For further details, see Amazon S3 pricing and Amazon Redshift pricing.

https://aws.amazon.com/kinesis/firehose/pricing/

Pitarys answered 3/11, 2015 at 17:29 Comment(3)
Yeah, i'm aware of the free transfer out of Firehose, however I'm assuming that's simply saying Firehose won't charge for the data transfer and says nothing about the S3 costs for each PUT (which I am assuming is what Firehose is using to write the data to S3). S3 charges per 1,000 PUT's and I'm trying to figure out if each flush from Firehose is 1 PUT or if the number of PUT's equals the number of individual records, regardless of whether they are consolidated into individual flushes from Firehose or not.Psalmody
I guess you could interpret it differently, but my interpretation is that there is no cost to get the data from firehose to s3. Only costs to ingest it to firehose, and then the storgae costs in s3 (and then any put/get charges if you read/write the data in s3).Pitarys
I understand your point now...To be honest, I can read it both ways. No idea which is correct, but for 1.5 billion individual records a month (which is not a tremendous amount), if S3 charged a PUT for each individual record, that would be about $7,000 each month JUST for the PUT's. That would seem excessive...Psalmody
T
0

the cost is one S3 PUT for any operation done by kinesis, not for a single object. so one flush of firehose is one put:

https://docs.aws.amazon.com/whitepapers/latest/building-data-lakes/data-ingestion-methods.html

https://forums.aws.amazon.com/thread.jspa?threadID=219275&tstart=0

Taddeusz answered 17/12, 2020 at 12:34 Comment(1)
Hello and welcome to SO! Please read the tour, and How do I write a good answer? For example consider quoting the relevant text from those articles.Belenbelesprit

© 2022 - 2024 — McMap. All rights reserved.