Is there anyway I can use preemptible instance for dataflow jobs?

It's evident that preemptible instance are cheaper than non-preemptible instance. On daily basis 400-500 dataflow jobs are running in my organisational project. Out of which some jobs are time-sensitive and others are not. So is there any way I could use preemptible instance for non-time-constraint job, which will cost me less for overall pipeline execution. Currently I'm running dataflow jobs with below specified configuration.

        options.setTempLocation("gs://temp/");
        options.setRunner(DataflowRunner.class);
        options.setTemplateLocation("gs://temp-location/");
        options.setWorkerMachineType("n1-standard-4");
        options.setMaxNumWorkers(20);
        options.setWorkerCacheMb(2000);

I'm not able to find out any pipeline options with preemptible instance setting.

Yes, it is possible to do so with Flexible Resource Scheduling in Cloud Dataflow (docs). Note that there are some things to consider:

Delayed execution: jobs are scheduled and not executed right away (you can see a new QUEUED status for your Dataflow jobs). They are run opportunistically when resources are available within a six-hour window. This makes FlexRS suitable to reduce cost for non-time-critical workloads. Also, be sure to validate your code before sending the job.
Batch jobs: as of now it only accepts batch jobs and requires to enable autoscaling:

You cannot set autoscalingAlgorithm=NONE
Dataflow Shuffle: it needs to be enabled. When so, no data is stored on persistent disks attached to the VMs. This way, when a preemption happens and resources are claimed back there is no need to redistribute the data.
Regions: according to the previous item, only regions where Dataflow Shuffle is supported can be selected. List here, turn-up for new regions will be announced in the release notes. As of now, zone is automatically chosen within the region.
Machine types: FlexRS currently supports n1-standard-2 (default) and n1-highmem-16.
SDK: requires 2.12.0 or newer for Java or Python.
Quota: quota is reserved upfront (i.e. queued jobs also consume quota).

In order to run it, use --flexRSGoal=COST_OPTIMIZED and make sure to take into account that the rest of parameters conform to the FlexRS needs.

A uniform discount rate is applied to FlexRS jobs, you can compare pricing details in the following link.

Note that you might see a Beta disclaimer in the non-English documentation but, as clarified in the release notes, it's Generally Available.

Recommended topics

Hot tags