How to configure Google Cloud Composer cost-effectively
Asked Answered
C

1

6

After some research and testing, we have decided to start using Google Cloud Composer. Since our current DAGs and tasks are relatively small, and don't require the server to run continuously, I am looking how to manage costs.

Two questions:

  1. The option to use preemptible VMs seems logical. This saves costs considerably, and I'm thinking to go for 3x n1-standard-4. I expect each task to be quite short, so don't think this will have significant impact for our workloads. Is it possible to use preemptible VMs with Composer?
  2. Schedule to turn the Composer environment on/off, as asked in this post. I can't find how to do this in the documentation, either by turning the whole enviroment down, or to shutdown the workers as proposed in the answer.

Help, anyone?

Cowpea answered 15/11, 2018 at 12:27 Comment(0)
C
8

This is an interesting question.

One roadblock you may encounter is the nature of Airflow itself. Generally, Airflow is not intended for use ephemerally. Instead, I'd suspect that the vast majority of Airflow use, Cloud Composer or otherwise, is persistent. Ephemerality brings cost benefits but also risks with Airflow architecture. For example, what happens if the scheduler to restart your Airflow resources fails?

To answer your questions:

  1. Preemptibles are not supported in Composer. While PVMs have a ton of awesome benefits, they could leave tasks in a very weird state, especially if you got preempted several times.
  2. There is not formal documentation for this process because it's generally informal and not recommended if you must depend on your environment. The basic approach, though, would be to:
    1. Create a very small GCE VM
    2. Setup the Cloud SDK (gcloud) to connect to your project
    3. Create a crontab that either does a fresh create/delete of an environment when you need it /or/ pauses the VMs in the Composer worker pool

In the long-term, I think Composer will better support ephemeral use of worker resources. In the short term, another option is to run a lightweight Airflow environment on a small(ish) GCE VM and then suspend/resume that VM when you need to use Airflow. You don't get Composer that way, but you do benefit from the team's work improving and expanding GCP support in core Airflow.

Cyclopean answered 20/11, 2018 at 3:55 Comment(1)
Thanks for sharing your thoughts!Cowpea

© 2022 - 2024 — McMap. All rights reserved.