How to join multiple Kafka topics?
Asked Answered
B

3

7

So I have...

  • 1st topic that has general application logs (log4j). Stores things like HTTP API requests/responses and warnings, exceptions etc... There can be multiple logs associated to one logical business request. (These logs happen within seconds of each other)
  • 2nd topic contains commands from the above business request which other services take action on. (The commands also happen within seconds of each other, but maybe couple minutes from the original request)
  • 3rd topic contains events generated from actions of those other services. (Most events complete within seconds, but some can take up to 3-5 days to be received)

So a single logical business request can have multiple logs, commands and events associated to it by a uuid which the microservices pass to each other.

So what are some of the technologies/patterns that can be used to read the 3 topics and join them all together as a single json document and then dump them to lets say Elasticsearch?

Streaming?

Blowfish answered 11/3, 2018 at 22:21 Comment(2)
You tagged Kafka Streams. It has a join operation. So does KSQLKamerman
kafka.apache.org/10/documentation/streams/developer-guide/…Beaner
W
9

You can use Kafka Streams, or KSQL, to achieve this. Which one depends on your preference/experience with Java, and also the specifics of the joins you want to do.

KSQL is the SQL streaming engine for Apache Kafka, and with SQL alone you can declare stream processing applications against Kafka topics. You can filter, enrich, and aggregate topics. Currently only stream-table joins are supported. You can see an example in this article here

The Kafka Streams API is part of Apache Kafka, and a Java library that you can use to do stream processing of data in Apache Kafka. It is actually what KSQL is built on, and supports greater flexibility of processing, including stream-stream joins.

Wisla answered 12/3, 2018 at 9:16 Comment(5)
Ok so I'm guessing streaming is the way to go... But I need more info around the "how", based on timing requirements (explained in my question).Blowfish
Kafka persists data. So long as your retention period is enough to meet your business requirements, you'll be able to do the joins you want to.Wisla
Ok, so business request logs a few lines now with seconds of each other and event is received 4 days later from 3rd party. So do I just set a window of say a few days? And will it mean that that the lot will only be processed when the last event comes in?Blowfish
Ok. So simpler solution which doesn't need streaming. I can simply read each topic and do upserts to the destination. I do not need to do streams to simply just join JSON. Further more with messages coming in as late as 4 days later makes windoing even harder.Blowfish
If you can only join 2 streams at a time or 1 table and 1 stream, combining 3 streams (say A, B and C) would need A joined with B to get AB, and then AB joined with C to get ABC. This is doable for a small number, but what if you need to combine a larger number, say 10 or more streams? Is this still the way to do this?Riant
E
3

You can use KSQL to join the streams.

  1. There are 2 constructs in KSQL Table/Stream.
  2. Currently, the Join is supported for a Stream & a table. So you need to identify the which is a good fit for what?
  3. You don't need windowing for joins.

Benefits of using KSQL.

  1. KSQL is easy to set up.
  2. KSQL is SQL language which helps you to query your data quickly.

Drawback.

  1. It's not production ready but in April-2018 the release is coming up.
  2. Its little buggy right now but certainly will improve in a few months.

Please have a look.

https://github.com/confluentinc/ksql

Exequatur answered 20/3, 2018 at 16:13 Comment(1)
I'm wondering if I even need streams. All I need is to make sure the documents are joined in Elasticsearch when they arrive in queue. Since messages can come in right away or a few days later it can maybe make it difficult with windowing?Blowfish
O
-2

Same as question Is it possible to use multiple left join in Confluent KSQL query? tried to join stream with more than 1 tables , if not then whats the solution? And seems like you can not have multiple join keys within same query.

Outing answered 28/9, 2018 at 17:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.