If you don't want to buy into the hassle around the "original" Kafka scripts, there's also kafkacat.
The basic idea is to
- consume the last message of each partition and
- add up the offsets (correcting for zero-based offsets).
Let's develop this.
kafkacat -C -b <broker> -t <topic> -o -1 -f '%p\t%o\n'
This will output something like this (plus "reached end of partition" notices on stderr):
0 77
1 75
2 78
Now, kafkacat
doesn't terminate but keeps waiting for new messages. We can circumvent this by adding a timeout (choose a value large enough so you get all partitions in your given environment):
timeout --preserve-status 1 kafkacat <snip>
Now we could go ahead and add up the second column (+1 each) -- but if there are new messages during that timeout interval, we might get something like this:
0 77
1 75
2 78
1 76
So we have to account for this, which is easy enough to do with a little awk
:
timeout --preserve-status 1 kafkacat <snip> 2> /dev/null \
| awk '{lastOffsets[$1] = $2} END {count = 0; for (i in lastOffsets) { count += lastOffsets[i] + 1 }; print count}'
Note how we use a (hash)map to remember the last seen offset for each partition until the timeout triggers, and then loop over the array to compute the sum.