What does multiple KAFKA_ADVERTISED_LISTENERS mean when we have only one broker, vs when we have many?

Asked 8/10, 2020 at 4:42 Answered 4/2 at 10:39

I am learning Kafka and trying to use it with docker. I'm confused looking at the docker-compose files, so I wanted to ask my questions here.

In most examples, I see a config like this:

broker:
   image: confluentinc/cp-enterprise-kafka:5.3.1
   ...
   ports:
       - "29192:29092" 
       - "9192:9092"
   environment:
        ...
        KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092

I wanted to understand a few things related to this:

Is there a particular convention for using 5-digit (ex: 29092) port vs 4-digit (ex: 9092) ports?
Is there a specific relation between the number of configured brokers to the number of listeners mentioned in KAFKA_ADVERTISED_LISTENERS? In some places, like confluent's own example, I see only one broker, but multiple KAFKA_ADVERTISED_LISTENERS. What does that mean?
Also, in the same confluent example mentioned above, there is a listener: PLAINTEXT://broker:29092 in KAFKA_ADVERTISED_LISTENERS, but we do not have any broker or zookeeper configured at port 29092, so what is that doing?

Sup answered 8/10, 2020 at 4:42 Comment(0)

KAFKA_ADVERTISED_LISTENERS is a list of addresses to a particular broker. When a client first contacts the broker acting as "bootstrap server" it will get in return the addresses of brokers responsible for each partition. Clients will need different addresses depending on which network they come from. The "bootstrap server", knowing "advertised listeners" for all the brokers, will choose the appropriate addresses to return based on which of its own listeners the client used to connect.

TL;DR

Source: https://rmoff.net/2018/08/02/kafka-listeners-explained/

Is there a particular convention for using 5-digit (ex: 29092) port vs 4-digit (ex: 9092) ports?

I believe the only convention is 9092 as the broker port. When multiple ports are needed people seem to use a variation of it. I've seen 9093, and now 29092.

Is there a specific relation between the number of configured brokers to the number of listeners mentioned in KAFKA_ADVERTISED_LISTENERS? In some places, like confluent's own example, I see only one broker, but multiple KAFKA_ADVERTISED_LISTENERS. What does that mean?

There is no relation. Each broker has it's own KAFKA_ADVERTISED_LISTENERS list. It refers to that particular broker only, not the other brokers.

When a client connects to kafka, it first talks to a broker acting as "bootstrap server". That broker will then respond with metadata including the addresses of particular brokers the client should reach when talking about particular partitions. These addresses will depend on which network the client is coming from. E.g. localhost:9092 for clients in the docker host vs. broker:29092 for clients within docker network. That's why one broker can have multiple KAFKA_ADVERTISED_LISTENERS.

Also, in the same confluent example mentioned above, there is a listener: PLAINTEXT://broker:29092 in KAFKA_ADVERTISED_LISTENERS, but we do not have any broker or zookeeper configured at port 29092, so what is that doing?

In the confluent example, that KAFKA_ADVERTISED_LISTENERS line is actually configuring the broker to listen on 29092. All clients internal to docker are using this port to reach the broker.

Btw, 29092 is not in the ports: mapping above as it's only used by internal clients (doesn't need to be exposed to a host port). Only 9092 needs to be exposed outside of docker in this example.

Prentice answered 27/5, 2021 at 18:6 Comment(0)

Is there a particular convention for using 5-digit (ex: 29092) port vs 4-digit (ex: 9092) ports?

To avoid getting a port conflict at run time of an application, you should configure ports for an application out of ephemeral port range of server. Refer this question and answer to find , how can you find or modify the ephemeral port range. So the convention is, find a port out of ephemeral port range and use as an application port.

Is there a specific relation between the number of configured brokers to the number of listeners mentioned in KAFKA_ADVERTISED_LISTENERS? In some places, like confluent's own example, I see only one broker, but multiple KAFKA_ADVERTISED_LISTENERS. What does that mean?

Kafka can be deployed in a cluster within multiple nodes. Each node has its own IP and port number within config/server.properties file. Number of listeners and advertised listeners depends on the number of nodes within your Kafka deployment.

Also, in the same confluent example mentioned above, there is a listener: PLAINTEXT://broker:29092 in KAFKA_ADVERTISED_LISTENERS, but we do not have any broker or zookeeper configured at port 29092, so what is that doing?

Kafka need a Zookeeper deployment to manage its cluster management, irrespective of there is only one node of Kafka or multiple nodes. But it can use any port, and there is no hard and fast rule to use 29092 port.

Escorial answered 8/10, 2020 at 5:18 Comment(1)

What you answered is a bit vague, even though i think the questions were pretty to-the-point . – Sup 8/10, 2020 at 15:2

To clarify your questions:

Is there a particular convention for using 5-digit (ex: 29092) port vs 4-digit (ex: 9092) ports?

Common values seen in documentation are sequences like 9092, 9093, 9094 and 9092, 19092, 29092 etc. While technically speaking they're arbitrary, the sequence usually matches up with a broker id.

A possible example:
Broker 1: 19092
Broker 2: 19093
Broker 3: 19094

Since services like databases usually have a single socket endpoint which clients connect to, there is usuaully a highly recognizable port number as the default, which then other services typically avoid using.

When I installed Kafka recently I saw port conflicts with 9092 on one machine, so I chose a different sequence of numbers starting from 19092.

Is there a specific relation between the number of configured brokers to the number of listeners mentioned in KAFKA_ADVERTISED_LISTENERS? In some places, like confluent's own example, I see only one broker, but multiple KAFKA_ADVERTISED_LISTENERS. What does that mean?

You can have multiple addresses for every broker. A broker might have an ip based address and a dns based address. If dns isn't working for some reason, the broker may still be contactable using the ip address.

As far as I am aware, a broker shouldn't usually have the address for other brokers as part of its advertised.listeners string. If you see this, it's probably a bad setup. (but there might be some legitimate reason for it, maybe proxying ... ?)

Also, in the same confluent example mentioned above, there is a listener: PLAINTEXT://broker:29092 in KAFKA_ADVERTISED_LISTENERS, but we do not have any broker or zookeeper configured at port 29092, so what is that doing?

Some possible reasons for this:

It might be because Kafka is running in a Docker container which exposes port 29092, but where the Kafka process is actually running on a different (internal to docker) port. Clients are expected to connect to the address broker:29092 for whatever reason.
It also might be because clients are expected to connect via the internet, and there is a NAT router which forwards 29092->KAFKA_HOST:KAFKA_PORT.

I've seen this article cited all over the place, and I personally I don't think it does anything other than confuse the issue.

For an alternative explanation see this and this blog post.

The key points of clarification are this:

listeners tells the Kafka process which port number and network interface to open a listening socket to. The network interface is determined from the ip address. It must be an ip address which is currently assigned to a network interface on the host.
advertised.listeners must be set if using the defualt value from listeners isn't going to work. This will be required if:
listeners is set to localhost
Clients are expected to connect from beyond a NAT router (eg via an address like kafka1.example.com:9093 instead of a private ip address
You are running Kafka in a Docker container (how will a client on an external or local network know what kafka_docker_1 is?)
The port number might be different if going via NAT or a Docker container host. It depends how the NAT performs port-forwarding, or what ports are mapped by the Docker container host.
You may have more than one listener name defined, with more than one set of config mappings. Commonly used names include INTERNAL, EXTERNAL PLAINTEXT, CONTROLLER, ZOOKEEPER. Technically these "names" are arbitrary. They only have meaning if they are not included in the security protocol map. (In which case special names like PLAINTEXT do have meaning.)
You would define more than one listener if you had for example, a Kafka cluster of multiple brokers where internal traffic was expected to be routed via an internal private network, and where clients were expected to connect from an external network via a NAT enabled router. Each listener needs to have a unique ip address and port number defined in listeners, from there, all the other config mappings follow.

Some background points which are useful to consider, which helps clarify some of your questions:

Kafka would usually run as a cluster of brokers, not a single broker system.
In a single broker system, the bootstrap.servers string is just going to contain a single entry. It has to contain the address of "how to contact" the single broker which is in this "cluster" ("cluster" containing a single broker)
This makes advertised.listeners seemingly pointless, because it's just going to contain the same address which the client application already has in it's bootstrap servers string
Now, as part of a multi-broker cluster, the advertised.listeners values from all brokers will be aggregated together and returned to the client when it connects. This means the client needs minimal knowledge of one of the brokers, and from this it can obtain the full list when it connects. For redundancy, you would usually specify 2 or more brokers in the bootstrap string, in case one of them is down when you attempt to connect
Actually, even in a single broker setup, advertised.listeners may not be completely pointless. If listeners is set to localhost:9093 that is obviously no good from the client's point of view. advertised.listeners will default to the value of listeners if it isn't explicitly set. Assuming your value of listeners will work when sent back to the client in place of a missing advertised.listeners, then you're ok. Otherwise, you must set advertised.listeners explicitly.
listeners is just there to tell the Kafka process which network interfaces to open a listening port on.

Pembrook answered 4/2 at 10:39 Comment(0)

Recommended topics

Hot tags