Google Kubernetes Engine vs Vertex AI (AI Platform Unified) for Serving Model Prediction

Asked 11/6, 2021 at 3:24 Answered 11/11, 2021 at 20:39

Solved google-kubernetes-engine google-ai-platform machine-learning-model

With Google releasing Vertex AI lately that integrates all its MLOps platforms, I wonder what would be the difference in serving a custom trained PyTorch/Tensorflow model on GKE vs Vertex AI (or AI Platform Unified, since the rebranding just took place and AI Platform already provides the capability to serve model prediction).

I did a lot of research but found little info on this. I'm already hosting my ML model on GKE and is it worth it to migrate to Vertex AI?

Note: I'm not planning to do training and other data preprocessing on cloud yet.

Sailor answered 11/6, 2021 at 3:24 Comment(1)

Hi, @SakshiGatyan thanks for your answer. The comments are very helpful and I look forward to building more ML applications through Vertex AI! – Sailor 14/7, 2021 at 2:9

It is worth considering Vertex AI as:

Vertex AI is a “Managed” ML platform for practitioners to accelerate experiments and deploy AI models. We don't need to manage the infrastructure/ servers/ health while deploying/training/ predicting ML models. Vertex AI will take care of that for you along with scaling according to the traffic.

Some key features which helps in considering Vertex AI:

Vertex AI streamlines model development

Once the model is trained we get detailed model evaluation metrics and feature attributions. (Feature attribution tells which features in the model signaled model’s predictions the most, which gives insights into how model is performing under the hood)

Scalable deployment with endpoints

Once the model is trained, it can be deployed to an endpoint. Traffic between models can be split for testing and machine type can also be customised

Orchestrating Workflow using Vertex pipeline

Vertex Pipelines help to avoid Concept of model drift which can happen when the environment around your model changes. Vertex pipeline can help automate this retaining workflow.

Monitoring deployed models using Vertex AI

Vertex model monitoring can be used to detect things like drift and training-serving-skew, so rather than manually checking to make sure the model is still performing correctly, using vertex AI provides confidence in model reliability because we'll be alerted anytime something changes.

Greige answered 15/6, 2021 at 14:9 Comment(6)

Would it be accurate to say that Vertex AI deployments and pipelines run on GKE clusters and the platform serves as a wrapper to manage, monitor, and interact with ML models more easily? Also if it does run on GKE, I assume Vertex AI's node autoscaling is essentially the same as that of GKE? The same applies to pod startup time and other metrics? – Sailor 17/6, 2021 at 6:9

And do know if this continuous evaluation feature from AI Platform (cloud.google.com/ai-platform/prediction/docs/…) is being kept in Vertex AI? – Sailor 17/6, 2021 at 6:10

Yes, currently Vertex-AI deployments do happen on GKE and autoscaling is the same as that of GKE. – Greige 13/7, 2021 at 8:0

Continuous evaluation feature is not supported in Vertex-AI yet, but the feature is being worked upon by the VertexAI Product team and will be supported soon. – Greige 13/7, 2021 at 8:1

hi sakshi. we run text-to-art models on g4dn.xlarge instances on AWS but are having massive scaling issues. the models are dockerized and easily portable. is this something vertex could support? if so, are there payload constraints? the docs are not clear. – – Jaco 11/1, 2022 at 1:19

@SakshiGatyan One bottleneck I'm facing is the online prediction size limit of 1.5 MB. Any workarounds? – Hottentot 8/2, 2022 at 10:34

I have also been exploring using Vertex AI for machine learning. Some points I found useful when it comes to serving model predictions from custom containers are as follows:

Serving predictions from a custom container on Vertex AI as opposed to a GKE cluster frees you from managing the infrastructure. Cost wise, GKE autopilot clusters seem more expensive compared to Vertex AI, but the difference in the case of GKE standard mode is not so clear. The compute engine pricing used for standard mode is lower than that for similar Vertex AI nodes but there is the added cluster management fee for standard mode.
I have not explored AI Platform much but in Vertex AI, the only available machine types for prediction are "N1", which cannot be scaled down to 0, at least 1 will always be running. This is a significant issue from the point of view of cost, especially if you deploy multiple models since each model will have it's own associated nodes independent from other models and scaling of nodes also happens at node level. There are workarounds though, to be able to serve multiple models from a single node, but the more I move towards custom models and such workarounds, the more it seems that the only advantage of Vertex AI is that of not having to manage the prediction serving infrastructure. Typically, there is a lot of pre and post processing that needs to happen when someone uses custom prediction containers. All that logic is contained in the custom container. I am still reading the documentation but based on what I have seen so far, many Vertex AI features, such as model monitoring, explainable ai seem very straightforward if using AutoML models, whereas if you are using custom models then there is some configuration that needs to be done. How straightforward that configuration is, I am yet to find out.

Heder answered 11/11, 2021 at 20:39 Comment(2)

any update on how hard it is to use custom models? – Jaco 11/1, 2022 at 1:17

to use custom models for prediction is straightforward enough, but from what I had understood, other Vertex AI features such as model monitoring etc. will not work with custom model serving. – Heder 11/1, 2022 at 13:58

Recommended topics

Hot tags