What is Kafka Consumer Group ID?

Your thoughts?

|

When people talk about a "Kafka consumer" they can mean different things...and it leads to some confusion.

A Kafka consumer is technically a process that is part of a larger group. This collective group of consumers is called a "consumer group".

It is the collective responsibility of a consumer group to process messages from a given topic. Each consumer within the group ideally will read from one partition. Kafka balances the number of partitions across the number of available consumers in the group.

The groupId associates a Kafka consumer with a consumer group....

If a topic has 3 partitions and you have 2 consumers operating within the same consumer group, one of the consumers will read from 2 partitions and the other will read from 1.

If a topic has 4 partitions and you have 2 consumers operating within the same group then both consumers will read from 2 partitions.

If a topic has 1 partition and you have 2 consumers then 1 consumer reads from 1 partition and the other just sits there...

|

The group ID is very important to how different consumers "load balance" partitions. For example, if you have a topic with 10 partitions then two consumers with the same groupId will read from 5 partitions each.

If you have two consumers with different group ids, both consumers will read from 10 partitions.

In this sense, the groupId is how you define a "consumer group" or group of consumers reading from a given topic/partitions.

|

Understanding what a consumer group is in Kafka requires the understanding of what a partition is first.

Topics store data in partitions. These partitions are key to Kafka's scalability and resiliency with respect to data loss. Partitions work to store and replicate a topic's data across all available nodes in the cluster.

Consumers read from topics. Since topics are collections of partitions, consumers exist in collections to distribute the work load of reading data evenly among all available worker "threads" or consumers in the "consumer group".

Put another way, consumers exist as groups to read from the groups of partitions. This is why it's so important to give a consumer a group id.


|

The group.id is how you distinguish different consumer groups. Remember that consumers work together in groups to read data from a particular topic.

Understanding group.id is fundamental to achieving maximum parallelism in Kafka. Remember that the number of partitions for a given topic will be balanced across the available consumers in the group.

|

Groups of consumer instances work together to read data from a given topic. For all of the consumers having the same group-id, partitions will be appropriately distributed and messages evenly consumed by all members of the group.

If you wanted the consumer instances to all read the same data then they would need to NOT be in the same group.

|

The consumer group id is what allows different consumer instances to collectively consume from a single topic.