Kubernetes - Trigger a rebalancing of pods
Asked Answered
C

1

9

I have a kubernetes cluster with a few nodes set up. I want to make sure that pods are distributed efficiently on the nodes.

I'll explain:

Let's assume that I have two nodes: Node 1 - 2gb ram Node 2 - 2gb ram

And I have these pods: Pod 1 - 1gb ram on Node 1 Pod 2 - 100mb ram on Node 1 Pod 3 - 1gb ram on Node 2 Pod 4 - 100mb ram on Node 2

Ok now the problem: let's say I want to add a pod with 1gb ram to the cluster. Currently there's no room in any node so kubernetes won't do it (unless I add another node). I wonder if there's a way that kubernetes will see that it can move Pod 3 to node 1 to make room for the new pod?

Help

Cray answered 31/8, 2018 at 0:1 Comment(0)
R
14

The Kubernetes descheduler incubator project will eventually be integrated into Kubernetes to accommodate rebalancing. This could be prompted by under/overutilization of node resources as your case suggests or for other reasons, such as changes in node taints or affinities.

For your case, you could run the descheduler with the LowNodeUtilization strategy and carefully configured thresholds to have some pods evicted and added back to the pod queue after the new 1gb pod.

Another method could use pod priority classes to cause a lower priority pod to be evicted and make room for the new incoming 1gb job. Pod priorities are enabled by default starting in version 1.11. Priorities aren't intended to be a rebalancing mechanism, but I mention it because it is a viable solution for ensuring a higher priority incoming pod can be scheduled. Priorities deprecate the old rescheduler that will be removed in 1.12.

Edit - include sample policy

The policy I used to test this is below:

apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
strategies:
  "LowNodeUtilization":
     enabled: true
     params:
       nodeResourceUtilizationThresholds:
         thresholds:
           "memory": 50
         targetThresholds:
           "memory": 51
           "pods": 0
Rotter answered 31/8, 2018 at 0:57 Comment(7)
Thanks for the answer man. The thing is that I don't want those pods evicted from my cluster. I just want them moved to another node to make room and keep their state. My example above is very simplistic, there can be much complicated cases with more nodes and pods. Can you provide an example of LowNodeUtilization that resolves this issue?Cray
As long as the pods are deployed via a deployment, new pods would be added to the scheduling queue. If all pods have the same priority, then it would be first-in-first-out order so the new 1gb pod would be at the front of the queue and the evicted pods would be scheduled after onto the node with available resources. However, if pods have different priorities, then the pod with the highest priority in the queue will be scheduled first even if it comes after other pods (assuming there is a node able to schedule it, potentially evicting pods that are already scheduled with a lower priority).Rotter
Thanks logan. I can understand the priorities but what I don't understand is how I need to set up LowNodeUtilization for the problem I presented above?Cray
I have made a gist (gist.github.com/lrakai/9dfa411a4434424446e8e93e0870131a) to confirm that the descheduler can achieve what you want running k8s 1.11.1. I'll add the policy to the answer. Relevant output is at the end of the file. Using the numbers you gave, you actually want to create an imbalance because the nodes are perfectly balanced. Of course, you could also manually delete one of the 100mb pods or use priorities. The descheduler will continue to improve and handle more of these cases over time.Rotter
Thanks man! Does it only work with 1.11.1 (I use EKS and it's 1.10 there)? Also, I can't delete pods - I need the small ones and the big ones. My problem is with having a balanced distribution of pods among nodes so I won't have to create new nodes (If the 2 nodes can satisfy and arrangement of pods why have 3 nodes?). I hope it's clear now - is your gist still relevant for this problem as well?Cray
Also, if you can give an explanation in the description on why that configuration solves the issue that'll be great.Cray
Descheduler does NOT do anything no matter what. It actually works for you people? It does absolutely nothing on my cluster. Why is this not built into kubernetes? How do I debug exactly why descheduler does nothing? I run the job and nothing happens. How do I proceed from here?Cobble

© 2022 - 2024 — McMap. All rights reserved.