I don't understand why would you write a rule like this:
every device-id be smaller than 4
If you want to limit the amount of GPUs you should be using limits
and requests
which is nicely explained on Schedule GPUs.
So you can just limit the resource to only 4 GPUs like so:
apiVersion: v1
kind: Pod
metadata:
name: cuda-vector-add
spec:
restartPolicy: OnFailure
containers:
- name: cuda-vector-add
# https://github.com/kubernetes/kubernetes/blob/v1.7.11/test/images/nvidia-cuda/Dockerfile
image: "k8s.gcr.io/cuda-vector-add:v0.1"
resources:
limits:
nvidia.com/gpu: 4 # requesting 1 GPU
If you have different types of GPUs on different nodes you can use labels
which you can read here Clusters containing different types of GPUs.
# Label your nodes with the accelerator type they have.
kubectl label nodes <node-with-k80> accelerator=nvidia-tesla-k80
kubectl label nodes <node-with-p100> accelerator=nvidia-tesla-p100
If your nodes are running different versions of GPUs, then use Node Labels and Node Selectors to schedule pods to appropriate GPUs. Following is an illustration of this workflow:
As part of your Node bootstrapping, identify the GPU hardware type on your nodes and expose it as a node label.
NVIDIA_GPU_NAME=$(nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0)
source /etc/default/kubelet
KUBELET_OPTS="$KUBELET_OPTS --node-labels='alpha.kubernetes.io/nvidia-gpu-name=$NVIDIA_GPU_NAME'"
echo "KUBELET_OPTS=$KUBELET_OPTS" > /etc/default/kubelet
Specify the GPU types a pod can use via Node Affinity rules.
kind: pod
apiVersion: v1
metadata:
annotations:
scheduler.alpha.kubernetes.io/affinity: >
{
"nodeAffinity": {
"requiredDuringSchedulingIgnoredDuringExecution": {
"nodeSelectorTerms": [
{
"matchExpressions": [
{
"key": "alpha.kubernetes.io/nvidia-gpu-name",
"operator": "In",
"values": ["Tesla K80", "Tesla P100"]
}
]
}
]
}
}
}
spec:
containers:
-
name: gpu-container-1
resources:
limits:
alpha.kubernetes.io/nvidia-gpu: 2
This will ensure that the pod will be scheduled to a node that has a Tesla K80 or a Tesla P100 Nvidia GPU.
You could find other relevant information on unofficial-kubernetes Scheduling gpus.
NVIDIA_VISIBLE_DEVICES
env instead of usingnvidia.com/gpu
resource request for pod, that's what nvidia k8s device plugin does. – Fondue"gpu/rule": "smaller than 4"
in pod spec and use a device plugin to parse that rule? you can either set theNVIDIA_VISIBLE_DEVICES
env to0,1,2
in pod spec directly without using device plugin, or modify the Allocate function: get the request rule"smaller than 4"
and setm.allocateEnvvar: "0,1,2"
. – Fondue