Amazon Kinesis KPL vs AWS SDK pros and cons
Asked Answered
D

1

15

The scenario is I would be writing large volumes of data ( terabytes per day) to kinesis stream.I want to know which is a better way to achieve high write throughput. I am considering the below two options for producer clients.

Option 1: using Kinesis producer library( KPL).

or

option 2: AWS SDK (api).

I know KPL is an abstraction used on top of aws sdk, so it basically boils down to (KPL with AWS-SDK) or just AWS-SDK. From what I have researched it seems to me AWS-SDK does not provide ability to aggregate multiple records into a single put, whereas KPL does support this aggregation ( please correct me if this is wrong).

Both PutRecords( from Kinesis Data Streams API ) and KPL(using aggregation) provide hight write throughput, the question is which of the two options is better and why?. In a nutshell interested in knowing which will be faster in terms of writing data to kinesis stream, once it is written to stream I do not care how it is read.Also interested in knowing retry mechanism difference in both cases and asynchronous write performance.

Dielectric answered 22/4, 2019 at 16:53 Comment(2)
The KPL is currently only available as a Java API wrapper around a C++ executable which may not be suitable for all deployment environments. So, if your choice of language is something other than Java then you can't use KPL for now.Chromatograph
Putrecords/sdk api for synchronous processing and KPL for asynchronous processing. For example, if you are processing critical events you should use async, informational events you should use asyncBumblebee
E
3

Yes, so there are two main difference between the SDK and KPL. Firstly, SDK sends records synchronously, without latency, whereas KPL allows for batching (aggregation and collection) which is at the cost of some latency determined by the RecordMaxBufferedTime, which helps maximize efficiency and throughput. Secondly, for KPL you need to deploy using Java whereas SDK allows for use of CLI or the Boto3 library for that matter which uses the SDK to help call APIs in python or other programming languages. Please refer to the API reference.

If your approach is language agnostic and no issue with a little latency, go for KPL. However, if you want communication to remain synchronous, go for the API and choose whatever language you prefer.

Conclusively, SDK is the basic operation, while KPL is built on top of that which includes the batching/aggregation/retry capability ready for you. For this reason KPL is higher latency as it has more built-in functionality compared to the SDK.

Eratosthenes answered 29/1, 2023 at 10:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.