I recently did some research again, and stumbled upon this. Before crying about it to the OpenJDK team, I wanted to see if anyone else has observed this, or disagrees with my conclusions.
So, it's widely known that the JVM for a long time ignored memory limits applied to the cgroup. It's almost as widely known that it now takes them into account, starting with Java 8 update something, and 9 and higher. Unfortunately, the calculations done based on the cgroup limits are so useless that you still have to do everything by hand. See google and the hundreds of articles on this.
What I only discovered a few days ago, and did not read in any of those articles, is how the JVM checks the processor count in cgroups. The processor count is used to decide on the number of threads used for various tasks, including also garbage collection. So getting it correct is important.
In a cgroup (as far as I understand, and I'm no expert) you can set a limit on the cpu time available (--cpus
Docker parameter). This limits time only, and not parallelism. There are also cpu shares (--cpu-shares
Docker parameter), which are a relative weight to distribute cpu time under load. Docker sets a default of 1024, but it's a purely relative scale.
Finally, there are cpu sets (--cpuset-cpus
for Docker) to explicitly assign the cgroup, and such the Docker container, to a subset of processors. This is independent of the other parameters, and actually impacts parallelism.
So, when it comes to checking how many threads my container can have running in parallel, as far as I can tell, only the cpu set is relevant. The JVM though ignores that, instead using the cpu limit if set, otherwise the cpu shares (assuming the 1024 default to be an absolute scale). This is IMHO already very wrong. It calculates available cpu time to size thread pools.
It gets worse in Kubernetes. It's AFAIK best practice to set no cpu limit, so that the cluster nodes have high utilization. Also, you should set for most apps a low cpu request, since they will be idle most of the time and you want to schedule many apps on one node. Kubernetes sets the request in milli cpus as cpu share, which is most likely below 1000m. The JVM then always assumes one processor, even is your node is running on some 64 core cpu monster.
Has anyone ever observed this as well? Am I missing something here? Or did the JVM devs actually make things worse when implementing cgroup limits for the cpu?
For reference:
- https://bugs.openjdk.java.net/browse/JDK-8146115
- https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#how-pods-with-resource-limits-are-run
cat /sys/fs/cgroups/cpu/cpu.share
while inside a container, locally or a cluster of your choice, to get settings used on startup