Confluent.Kafka.KafkaException: Broker: Specified group generation id is not valid

Asked 30/10, 2020 at 21:45 Answered 26/6, 2024 at 20:9

Environment

3-node Kafka Cluster
- Amazon MSK
- v2.3
1 topic
- 6 partitions
1 consumer group with 2 consumers
- Running in Kubernetes
- Confluent .NET SDK 1.2.2
- Except for bootstrap.servers and group.id, all of the default settings.

Problem

First, one of my consumers encounters the following exception.

Confluent.Kafka.KafkaException: Broker: Specified group generation id is not valid
   at Confluent.Kafka.Impl.SafeKafkaHandle.Commit(IEnumerable`1 offsets)
   at Confluent.Kafka.Consumer`2.Commit(IEnumerable`1 offsets)

The exception is trapped and the consumer is supposed to retry, but instead the app sits idle. The container is still up and running, but not consuming any more messages.

What's weirder is that the broker never reassigns that consumer's partitions so the consumer lag on those partitions begins to grow. It seems like the consumer is both alive (since the broker is not reassigning its partitions) and dead (since it cannot commit its offset or consume more messages). If we intervene and manually restart the consumers then the partitions are reassigned and the situation goes back to normal.

I'm not entirely sure what to make of the exception above. Google doesn't offer much. The most relevant lead I have is this issue in GitHub, which involves a broker restarting. To my knowledge, that is not happening in my situation. Any assistance would be greatly appreciated.

Rapture answered 30/10, 2020 at 21:45 Comment(2)

Did you ever figure this out? I'm running into a similar situation. – Jellyfish 27/1, 2021 at 19:19

I got the same error now. Do you have a solution for it ? – Germin 19/4, 2021 at 7:39

For further reference if anyone bumps into the same problem.

When a consumer does not respond to heartbeat request in time it is kicked out of the group and should re-join. If message processing takes a long time on the same thread which the consumer uses, it is possible that it loses the group membership before processing finishes.

When auto commits are off and the consumer acknowledges only after processing the message (on the same thread) it is possible that the consumer is already kicked out of the group and tries to commit with the former (already wrong) group id.

That was in my case. So, by using auto commits, the problem solves itself, because the message is acknowledged immediately after consuming.

However if the the logic needs the message to be acknowledged only when it is processed, this solution is not acceptable. In my case I solved the problem by processing on a different thread and using a polling mechanism to notify the consumer to consume the next message when processing is finished.

Burn answered 26/6, 2024 at 20:9 Comment(0)

-2

at least I have found a solution for me. In my code I did a manual commit and set EnableAutoCommit = false.

Somehow it was possible that for an offset a commit was executed twice. I removed the manual commits on the consumer and set EnableAutoCommit = true.

After that it worked.

Germin answered 11/5, 2021 at 14:32 Comment(1)

Auto commits are fundamentally at odds with producing transactionally. – Clara 22/12, 2023 at 9:6

Environment

Problem

Recommended topics

Hot tags