What are the differences between airflow and Kubeflow pipeline?
Asked Answered
Y

2

14

Machine learning platform is one of the buzzwords in business, in order to boost develop ML or Deep learning.

There are a common part workflow orchestrator or workflow scheduler that help users build DAG, schedule and track experiments, jobs, and runs.

There are many machine learning platform that has workflow orchestrator, like Kubeflow pipeline, FBLearner Flow, Flyte

My question is what are the main differences between airflow and Kubeflow pipeline or other ML platform workflow orchestrator?

And airflow supports different language API and has large community, can we use airflow to build our ML workflow ?

York answered 26/11, 2019 at 8:3 Comment(0)
G
7

You can definitely use Airflow to orchestrate Machine Learning tasks, but you probably want to execute ML tasks remotely with operators.

For example, Dailymotion uses the KubernetesPodOperator to scale Airflow for ML tasks.

If you don't have the resources to setup a Kubernetes cluster yourself, you can use a ML platforms like Valohai that have an Airflow operator.

When doing ML on production, ideally you want to also version control your models to keep track of the data, code, parameters and metrics of each execution.

You can find more details on this article on Scaling Apache Airflow for Machine Learning Workflows

Granitite answered 26/11, 2019 at 21:18 Comment(2)
Thanks for your reply, it seems that I need to upgrade my medium account, are there any other link about this, so i can reference it. how about build our ML pipeline on hadoopYork
Sorry for the link. I updated to a Medium friend link so that you can access it now. Airflow is good for creating workflows, but then the work can be done remotely. You can do that with different executors and operators to launch work on other platforms (spark...) or infrastructure (kubernetes cluster...)Granitite
O
4

My question is what are the main differences between airflow and Kubeflow pipeline or other ML platform workflow orchestrator?

Airflow pipelines run in the Airflow server (with the risk of bringing it down if the task is too resource intensive) while Kubeflow pipelines run in a dedicated Kubernetes pod. Also Airflow pipelines are defined as a Python script while Kubernetes task are defined as Docker containers.

And airflow supports different language API and has large community, can we use airflow to build our ML workflow ?

Yes you can, you could for example use an Airflow DAG to launch a training job in a Kubernetes pod to run a Docker container emulating Kubeflow's behaviour, what you will miss is some ML specific features from Kubeflow like model tracking or experimentation.

Offhand answered 31/5, 2021 at 22:23 Comment(1)
Kubernetes components can be specified in straight python / jupyter - not just containers. That is to say, the user doesn't need to worry about docker which happens behind the scenes. Though I suppose they could if they wanted to.Reavis

© 2022 - 2024 — McMap. All rights reserved.