Is there any simulator/tool to generate messages for streaming?

Asked 19/10, 2016 at 12:35 Answered 12/7, 2018 at 8:27

streaming apache-kafka simulation messaging stub

For testing purpose, I need to simulate client for generating 100,000 messages per second and send them to kafka topic. Is there any tool or way that can help me generate these random messages?

Cuomo answered 19/10, 2016 at 12:35 Comment(1)

I never worked with kafka. But do you want client to response for high amount of msg on request or do you want client to request with so many msgs? In first case you may use stubbydb. In second case you can use jmeter. – Prodigy 20/10, 2016 at 7:28

There's a built-in tool for generating dummy load, located in bin/kafka-producer-perf-test.sh (https://github.com/apache/kafka/blob/trunk/bin/kafka-producer-perf-test.sh). You may refer to https://github.com/apache/kafka/blob/trunk/tools/src/main/java/org/apache/kafka/tools/ProducerPerformance.java#L106 to figure out how to use it.

One usage example would be like that:

bin/kafka-producer-perf-test.sh --broker-list localhost:9092 --messages 10000000 --topic test --threads 10 --message-size 100 --batch-size 10000 --throughput 100000

The key here is the --throughput 100000 flag which indicated "throttle maximum message amount to approx. 100000 messages per second"

Buller answered 19/10, 2016 at 13:50 Comment(0)

The existing answers (e.g., kafka-producer-perf-test.sh) are useful for performance testing, but much less so when you need to generate more than just "a single stream of raw bytes". If you need, for example, to simulate more realistic data with nested structures, or generate data in multiple topics that have some relationship to each other, they are not sufficient. So if you need more than generating a bunch of raw bytes, I'd look at the alternatives below.

Update Dec 2020: As of today, I recommend the use of https://github.com/MichaelDrogalis/voluble. Some background info: The author is the product manager at Confluent for Kafka Streams and ksqlDB, and the author/developer of http://www.onyxplatform.org/.

From the Voluble README:

Creating realistic data by integrating with Java Faker.

Cross-topic relationships

Populating both keys and values of records

Making both primitive and complex/nested values

Bounded or unbounded streams of data

Tombstoning

Voluble ships as a Kafka connector to make it easy to scale and change serialization formats. You can use Kafka Connect through its REST API or integrated with ksqlDB. In this guide, I demonstrate using the latter, but the configuration is the same for both. I leave out Connect specific configuration like serializers and tasks that need to be configured for any connector.

Old answer (2016): I'd suggest to take a look at https://github.com/josephadler/eventsim, which will produce more "realistic" synthetic data (yeah, I am aware of the irony of what I just said :-P):

Eventsim is a program that generates event data for testing and demos. It's written in Scala, because we are big data hipsters (at least sometimes). It's designed to replicate page requests for a fake music web site (picture something like Spotify); the results look like real use data, but are totally fake. You can configure the program to create as much data as you want: data for just a few users for a few hours, or data for a huge number of users of users over many years. You can write the data to files, or pipe it out to Apache Kafka.

You can use the fake data for product development, correctness testing, demos, performance testing, training, or in any other place where a stream of real looking data is useful. You probably shouldn't use this data to research machine learning algorithms, and definitely shouldn't use it to understand how real people behave.

Maryalice answered 20/10, 2016 at 7:11 Comment(0)

You can make use of Kafka Connect to generate random test data. Check out this custom source Connector https://github.com/xushiyan/kafka-connect-datagen

It allows you to define some settings like message template and randomizable fields to generate test data. Also check out this post for detailed demonstration.

Robins answered 12/7, 2018 at 8:27 Comment(2)

Not to be confused with the other kafka-connect-datagen that supports Avro data - github.com/confluentinc/kafka-connect-datagen – Skite 3/5, 2019 at 21:48

any option that can be used with SSL? (Heroku) – Hasan 18/3, 2020 at 14:52

Recommended topics

Hot tags