In Kubernetes services talk to each other via a service ip. With iptables or something similar each TCP connection is transparently routed to one of the pods that are available for the called service. If the calling service is not closing the TCP connection (e.g. using TCP keepalive or a connection pool) it will connect to one pod and not use the other pods that may be spawned.
What is the correct way to handle such a situation?
My own unsatisfying ideas:
Closing connection after each api call
Am I making every call slower only to be able to distribute requests to different pods? Doesn't feel right.
Minimum number of connections
I could force the caller to open multiple connections (assuming it would then distribute the requests across these connections) but how many should be open? The caller has (and probably should not have) no idea how many pods there are.
Disable bursting
I could limit the resources of the called services so it gets slow on multiple requests and the caller will open more connections (hopefully to other pods). Again I don't like the idea of arbitrarily slowing down the requests and this will only work on cpu bound services.