The time-series data is being produced in a kafka topic. I need to read each record and decorate with some data from the database and eventually call a REST API. Once the response is received, output to a kafka topic. How can I do this with kafka streams API efficiently and scalable?
Steps -
- Start reading the input topic
- Call mapvalues to make a database call and decorate the record with the additional data
- Make a REST api call with the input request, get the response.
- Output the record in the kafka topic
I think, there are two bottlenecks in the above algorithm -
Making a database calls would slow it down. This can be circumvented by caching the meta-data and load the meta-data when there is a mis or use state store.
Making the REST API call synchronously would slow it down.
final KStream<String, String> records = builder.stream(InputTopic);
//This is bad
final KStream<String, String> output = records
.mapValues(value -> { //cache hit otherwise database call});
.mapValues(value -> { //prepare http request and convert the http resonse };
output.to(OutputTopic)
The code above will have a dependency and adverse effect on the throughput if the database call or REST API takes longer time to complete. Records with the same key should not be processed out of order. The expected throughput is about 1m/minute. When one record reaches REST API, it is okay to make the database calls concurrently.
Not sure how to go about writing the topology which can scale in this scenario. I am new to kafka streams.
KTable
or deal with the remote calls and corresponding tradeoffs. Not sure what else you expect to get as an answer? – Opisthognathous