How do we assign pods properly so that KFServing can scale down GPU Instances to zero?

I'm setting up an InferenceService using Argo and KFServing with Amazon EKS (Kubernetes). Its important to know that our team has one EKS cluster per environment, which means there can be multiple applications within our cluster that we don't control.

Here is what we have setup so far

We push up a docker image to Amazon ECR that contains our logic to train and deploy our model
We use argo to submit workflows which start the training in #1. When installing argo into our kubernetes cluster, we notice that its components are sometimes assigned to the GPU nodes.
We are using KFServing as well. KFServing allows us to auto-scale our GPU up and down, specifically scaling to zero when its not in use. The components in KFServing also get assigned to GPU nodes when applying them to our cluster.

The current setup we have for #2 and #3 (above) seems to prohibit KFServing's ability to scale down to zero. It concerns us that having these components in the GPU would not allow the GPU to scale down.

Which pods need to be assigned to our GPU nodes?

(Option 1) Do we only need our argo workflow pod to be assigned and repel the rest?

-- OR --

(Option 2) Are there other kfserving components needed within the GPU node to work right?

Option 1: How do we repel all pods from going into our GPU nodes other than the argo workflow pod? As a reminder, we have other applications we can't control, so adding node affinities for every pod seems unrealistic.
Option 2: How do the GPU nodes scale to zero when these GPU nodes have kfserving components in them? I was under the impression that scaling down means there are no pods in the node.

tl;dr You can use taints.

Which pods need to be assigned to our GPU nodes?

The pods of the jobs that require GPU.

If your training job requires GPU you need to assign it using the nodeSelector and tolerations in the spec of your training/deployment deployment, see a nice example here.

If your model is CV/NLP (many matrix multiplications), you might want to have the inferenceservice in the GPU as well, in that case you need to have it requested in its spec as described here.

Do we only need our argo workflow pod to be assigned and repel the rest?

Yes, if your inferenceservice does not require GPU.

Are there other kfserving components needed within the GPU node to work right?

No, the only kfserving component is the kfserving-controller and does not require a gpu as it's only orchestrating the creation of the istio&knative resources for your inferenceservice.

If there are inferenceservices running in your gpu nodegroup without having the GPU requested in the spec, it means that the nodegroup is not configured to have the taint effect NoSchedule. Make sure that the gpu nodegroup in the eksctl configuration has the taint as described in the doc.

Recommended topics

Hot tags