Vertex AI prediction - Autoscaling cannot set minimum node to 0

Asked 9/11, 2021 at 19:49 Answered 17/11, 2021 at 5:19

Solved google-cloud-platform google-ai-platform google-cloud-vertex-ai

I am unclear abut Vertex AI pricing for model predictions. In the documentation, under the heading More about automatic scaling of prediction nodes one of the points mentioned is:

"If you choose automatic scaling, the number of nodes scales automatically, and can scale down to zero for no-traffic durations"

The example provided in the documentation later also seems to suggest that during a period with no traffic, zero nodes are in use. However, when I create an Endpoint in Vertex AI, under the Autoscaling heading it says:

"Autoscaling: If you set a minimum and maximum, compute nodes will scale to meet traffic demand within those boundaries"

The value of 0 under "Minimum number of compute nodes" is not allowed so you have to enter 1 or greater, and it is mentioned that:

Default is 1. If set to 1 or more, then compute resources will continuously run even without traffic demand. This can increase cost but avoid dropped requests due to node initialization.

My question is, what happens when I select autoscaling by setting Minimum to 1 and Maximum to, say, 10. Does 1 node always run continuously? Or does it scale down to 0 nodes in no traffic condition as the documentation suggests.

To test I deployed an Endpoint with Autoscaling (min and max set to 1) and then when I sent a prediction request the response was almost immediate, suggesting the node was already up. I did that again after about an hour and again the response was immediate suggesting that the node never shut down probably. Also, for high latency requirements, is having autoscale to 0 nodes, if that is indeed possible, even practical, i.e., what latency can we expect for starting up from 0 nodes?

Harbor answered 9/11, 2021 at 19:49 Comment(0)

Are you using an N1 or a non-N1 machine type? If you want to autoscale to zero, you must use non-N1 machines. See second note from node allocation:

Note: Versions that use a Compute Engine (N1) machine type cannot scale down to zero nodes. They can scale down to 1 node, at minimum.

Update: AI Platform supports scaling to zero, while Vertex AI currently does not. From the scaling documentation, nodes can scale but there is no mention that it can scale down to zero. Here's a public feature request for people who wants to track this issue.

With regards to latency requirements, the actual output will vary. However, one thing to note according to the documentation is that the service may not be able to bring nodes online fast enough to keep up with large spikes of request traffic. If your traffic regularly has steep spikes, and if reliably low latency is important to your application, you may want to consider manual scaling.

Additional Reference: https://cloud.google.com/ai-platform/prediction/docs/machine-types-online-prediction#automatic_scaling

Throe answered 11/11, 2021 at 4:1 Comment(3)

Thanks, based on this documentation, cloud.google.com/vertex-ai/docs/predictions/…, it seems that the only "N1" machine types are available for prediction in Vertex AI so scaling down to 0 in Vertex AI is not possible? I also tried to create an Endpoint from cloud console and the only options available were for N1 machine types. – Harbor 11/11, 2021 at 19:3

@Harbor that would be the case. I updated my answer. Please star the feature request so it will gain traction (no ETAs though). – Throe 12/11, 2021 at 1:16

I just let a node run for a day and costed me $33. I added the feature request. This feature was created without considering any budget aspect of it which is the most important part to many users. – Cant 31/12, 2021 at 14:31

N1 machines scaling to zero is not possible at this moment - we are planning to add this support in 2022 (hopefully). mls- machines on legacy ai platform could scale to 0 however, it doesn't support GPU and the model size has a smaller limit (2G?)

Demmer answered 17/11, 2021 at 5:19 Comment(8)

I created a model and let it run for a day; it burned around $20 and did nothing. If I let it run for a month, it will cost me over $600 for just one node; what if we need a continuous solution that spins up the nodes when they are needed. It is impossible to do it with the current structure. Google created something awesome that no one will use in the long run; it is not sustainable. @Demmer – Cant 31/12, 2021 at 1:48

*Update: It is not $20 per day it is $33 per day. Just one node can cost you $1000 a month if you want a continuous solution. – Cant 31/12, 2021 at 14:27

which machine type are you talking about? for n2-standard-2, $0.097118 * 24 * 30 = $70 – Demmer 1/1, 2022 at 2:38

Actually, I was wrong; it is almost $40 per day, not $33. So it will cost me 40*30 = $1200 per month. For Vertex Ai, you don't have any option to choose what, I use the Java code example according to THIS:

AutomaticResources automaticResourcesInput = AutomaticResources.newBuilder().setMinReplicaCount(1).setMaxReplicaCount(1).build();

– Cant 1/1, 2022 at 17:17

Note: If I want to use

MachineSpec machineSpec = MachineSpec.newBuilder().setMachineType("n1-standard-2").build();       DedicatedResources dedicatedResources = DedicatedResources.newBuilder().setMinReplicaCount(1).setMachineSpec(machineSpec).build();

it will throw an error becuase this is for custome_trained not AutoML. – Cant 1/1, 2022 at 17:46

hi shawn. we run text-to-art models on g4dn.xlarge instances on AWS but are having massive scaling issues. the models are dockerized and easily portable. is this something vertex could support? if so, are there payload constraints? the docs are not clear. – Packhorse 11/1, 2022 at 1:16

For public endpoints, payload size is rather small - 2M at this moment. We are working to support 20M+; For private endpoints we don't have a limit for payload size. A custom container is supported, but it has to follow some conventions. cloud.google.com/vertex-ai/docs/predictions/… – Demmer 12/1, 2022 at 17:0

Lool, It was supposed to roll out in 2022 and now we are getting close to 2025; :) Go Google – Cant 8/5 at 16:12

Recommended topics

Hot tags