I'm setting up an InferenceService using Argo and KFServing with Amazon EKS (Kubernetes). Its important to know that our team has one EKS cluster per environment, which means there can be multiple applications within our cluster that we don't control.
Here is what we have setup so far
- We push up a docker image to Amazon ECR that contains our logic to train and deploy our model
- We use
argo
to submit workflows which start the training in #1. When installing argo into our kubernetes cluster, we notice that its components are sometimes assigned to the GPU nodes. - We are using KFServing as well. KFServing allows us to auto-scale our GPU up and down, specifically scaling to zero when its not in use. The components in KFServing also get assigned to GPU nodes when applying them to our cluster.
The current setup we have for #2 and #3 (above) seems to prohibit KFServing's ability to scale down to zero. It concerns us that having these components in the GPU would not allow the GPU to scale down.
Which pods need to be assigned to our GPU nodes?
(Option 1) Do we only need our argo workflow pod to be assigned and repel the rest?
-- OR --
(Option 2) Are there other kfserving components needed within the GPU node to work right?
Option 1: How do we repel all pods from going into our GPU nodes other than the argo workflow pod? As a reminder, we have other applications we can't control, so adding node affinities for every pod seems unrealistic.
Option 2: How do the GPU nodes scale to zero when these GPU nodes have kfserving components in them? I was under the impression that scaling down means there are no pods in the node.