How to make REST API calls in kafka streams application/
Asked Answered
D

0

6

The time-series data is being produced in a kafka topic. I need to read each record and decorate with some data from the database and eventually call a REST API. Once the response is received, output to a kafka topic. How can I do this with kafka streams API efficiently and scalable?

Steps -

  • Start reading the input topic
  • Call mapvalues to make a database call and decorate the record with the additional data
  • Make a REST api call with the input request, get the response.
  • Output the record in the kafka topic

I think, there are two bottlenecks in the above algorithm -

  • Making a database calls would slow it down. This can be circumvented by caching the meta-data and load the meta-data when there is a mis or use state store.

  • Making the REST API call synchronously would slow it down.

    final KStream<String, String> records = builder.stream(InputTopic);

    //This is bad
    final KStream<String, String> output = records
      .mapValues(value ->  { //cache hit otherwise database call});
      .mapValues(value ->  { //prepare http request and convert the http resonse };
    output.to(OutputTopic)

The code above will have a dependency and adverse effect on the throughput if the database call or REST API takes longer time to complete. Records with the same key should not be processed out of order. The expected throughput is about 1m/minute. When one record reaches REST API, it is okay to make the database calls concurrently.

Not sure how to go about writing the topology which can scale in this scenario. I am new to kafka streams.

Dettmer answered 12/11, 2019 at 4:20 Comment(7)
One recommended approach is to pull your database/API data into a KTable, then join the stream on some ID.Scrimmage
+1 stream the database into Kafka and do the join locally (c.f. rmoff.dev/codetalks19-no-more-silos-video)Bulganin
Have you read my answer at https://mcmap.net/q/690706/-kafka-streams-and-rpc-is-calling-rest-service-in-map-operator-considered-an-anti-pattern?Lyricism
@MichaelG.Noll.. Thanks for reference. The answer talks about pros and cons. I was looking for different ways to solve the problem.Dettmer
@RobinMoffatt.. Nice talk.. but does not help to solve my problem.Dettmer
@Mac: Well. You got two "answers" -- import the data into a KTable or deal with the remote calls and corresponding tradeoffs. Not sure what else you expect to get as an answer?Opisthognathous
You may also find this post useful in understanding the tradeoffs when making remote calls from Kafka Streams and what you can do about them: responsive.dev/blog/async-processing-for-kafka-streamsAkilahakili

© 2022 - 2024 — McMap. All rights reserved.