To clarify your questions:
- Is there a particular convention for using 5-digit (ex: 29092) port vs 4-digit (ex: 9092) ports?
Common values seen in documentation are sequences like 9092, 9093, 9094
and 9092, 19092, 29092
etc. While technically speaking they're arbitrary, the sequence usually matches up with a broker id.
- A possible example:
- Broker 1:
19092
- Broker 2:
19093
- Broker 3:
19094
Since services like databases usually have a single socket endpoint which clients connect to, there is usuaully a highly recognizable port number as the default, which then other services typically avoid using.
When I installed Kafka recently I saw port conflicts with 9092
on one machine, so I chose a different sequence of numbers starting from 19092
.
- Is there a specific relation between the number of configured brokers to the number of listeners mentioned in
KAFKA_ADVERTISED_LISTENERS
? In some places, like confluent's own example, I see only one broker, but multiple KAFKA_ADVERTISED_LISTENERS
. What does that mean?
You can have multiple addresses for every broker. A broker might have an ip based address and a dns based address. If dns isn't working for some reason, the broker may still be contactable using the ip address.
As far as I am aware, a broker shouldn't usually have the address for other brokers as part of its advertised.listeners
string. If you see this, it's probably a bad setup. (but there might be some legitimate reason for it, maybe proxying ... ?)
- Also, in the same confluent example mentioned above, there is a listener:
PLAINTEXT://broker:29092
in KAFKA_ADVERTISED_LISTENERS
, but we do not have any broker or zookeeper configured at port 29092, so what is that doing?
Some possible reasons for this:
- It might be because Kafka is running in a Docker container which exposes port
29092
, but where the Kafka process is actually running on a different (internal to docker) port. Clients are expected to connect to the address broker:29092
for whatever reason.
- It also might be because clients are expected to connect via the internet, and there is a NAT router which forwards
29092->KAFKA_HOST:KAFKA_PORT
.
I've seen this article cited all over the place, and I personally I don't think it does anything other than confuse the issue.
For an alternative explanation see this and this blog post.
The key points of clarification are this:
listeners
tells the Kafka process which port number and network interface to open a listening socket to. The network interface is determined from the ip address. It must be an ip address which is currently assigned to a network interface on the host.
advertised.listeners
must be set if using the defualt value from listeners
isn't going to work. This will be required if:
listeners
is set to localhost
- Clients are expected to connect from beyond a NAT router (eg via an address like
kafka1.example.com:9093
instead of a private ip address
- You are running Kafka in a Docker container (how will a client on an external or local network know what
kafka_docker_1
is?)
- The port number might be different if going via NAT or a Docker container host. It depends how the NAT performs port-forwarding, or what ports are mapped by the Docker container host.
- You may have more than one listener name defined, with more than one set of config mappings. Commonly used names include
INTERNAL
, EXTERNAL
PLAINTEXT
, CONTROLLER
, ZOOKEEPER
. Technically these "names" are arbitrary. They only have meaning if they are not included in the security protocol map. (In which case special names like PLAINTEXT
do have meaning.)
- You would define more than one listener if you had for example, a Kafka cluster of multiple brokers where internal traffic was expected to be routed via an internal private network, and where clients were expected to connect from an external network via a NAT enabled router. Each listener needs to have a unique ip address and port number defined in
listeners
, from there, all the other config mappings follow.
Some background points which are useful to consider, which helps clarify some of your questions:
- Kafka would usually run as a cluster of brokers, not a single broker system.
- In a single broker system, the
bootstrap.servers
string is just going to contain a single entry. It has to contain the address of "how to contact" the single broker which is in this "cluster" ("cluster" containing a single broker)
- This makes
advertised.listeners
seemingly pointless, because it's just going to contain the same address which the client application already has in it's bootstrap servers string
- Now, as part of a multi-broker cluster, the
advertised.listeners
values from all brokers will be aggregated together and returned to the client when it connects. This means the client needs minimal knowledge of one of the brokers, and from this it can obtain the full list when it connects. For redundancy, you would usually specify 2 or more brokers in the bootstrap string, in case one of them is down when you attempt to connect
- Actually, even in a single broker setup,
advertised.listeners
may not be completely pointless. If listeners
is set to localhost:9093
that is obviously no good from the client's point of view. advertised.listeners
will default to the value of listeners
if it isn't explicitly set. Assuming your value of listeners
will work when sent back to the client in place of a missing advertised.listeners
, then you're ok. Otherwise, you must set advertised.listeners
explicitly.
listeners
is just there to tell the Kafka process which network interfaces to open a listening port on.