Is there a way to specify a minimum number of workers for Cloud Dataflow w/ autoscaling?
Asked Answered
O

4

1

I'd like to specify a minimum number of workers for my job that autoscaling will not go below (akin to how it works for max_num_workers). Is this possible? My reason is that sometimes the worker startup takes long enough that the autoscaling decides to drop the number of workers to one, even though doing so is not optimal for my job. I'd still like to use autoscaling in case the job is larger than my estimated minimum.

Overspill answered 14/8, 2018 at 20:0 Comment(4)
Do you think it should not have downscaled to 1 or is the issue more about subsequent upscaling results in pausing processing for couple of minutes?Moot
I'm not sure what you mean by pausing processing. If it hadn't downscaled the work would have been parallelized earlier - e.g if it took 5 minutes to set up the workers, if nothing drops after 5 minutes everything is running w/ all the workers. Otherwise, at 5 minutes only one worker starts, realizes it needs more workers, starts up another one, 5 minutes later it realizes it needs another one, etc.Overspill
Could you provide a job_id to understand this better? The autoscaler monitors the metrics for couple of minutes before deciding what to upscale to. E.g. if load doubles suddenly, it should upscale to enough workers to handle the new load, it should not increment one worker at a time.Moot
I was misremembering how slowly it scaled up, so you can ignore this part. I still wish that it didn't drop all but one worker at the beginning. Here is the job_id though incase you still want to look. 2018-08-14_12_33_00-7916980579545141542Overspill
M
3

Minimum number of workers is not yet supported. Could file a ticket with job details so that it support can take a look to understand why it downscales to too few workers?

Moot answered 14/8, 2018 at 21:20 Comment(0)
W
2

According to the Autoscaling documentation, you could specify the maximum number of workers in the --maxNumWorkers option and the --numWorkers as the initial number of workers. You could find a description of these options in this document

Whodunit answered 14/8, 2018 at 20:46 Comment(0)
B
1

Try using flag 'minNumWorkers' to configure the minimum number of workers for dataflow jobs with autoscaling.

--experiments=minNumWorkers=N

This flag is yet not GA, though the GCP product team is working on it to make it so soon.

Berezina answered 2/3, 2023 at 9:16 Comment(0)
W
0

You can specify minimum number of workers using --numWorkers which is the initial number of workers to be used when application is deployed.

You can specify maximum number of workers using --maxNumWorkers which is when dataflow autoscales then maximum how many workers it can use.

Wireless answered 3/9, 2018 at 12:40 Comment(1)
Does this add anything to F10's Aug 14, 2018 answer?Quickly

© 2022 - 2024 — McMap. All rights reserved.