What is the difference between GCP Kubeflow and GCP cloud composer?
Asked Answered
M

3

5

I am learning GCP, and came across Kuberflow and Google Cloud Composer.
From what I have understood, it seems that both are used to orchestrate workflows, empowering the user to schedule and monitor pipelines in the GCP.
The only difference that I could figure out is that Kuberflow deploys and monitors Machine Learning models. Am I correct? In that case, since Machine Learning models are also objects, can't we orchestrate them using Cloud Composer? How does Kubeflow help in any way, better than Cloud Composer when it comes to managing Machine Learning models??

Thanks

Millsap answered 17/3, 2020 at 8:7 Comment(0)
P
5
  • Kubeflow is a platform for developing and deploying a machine learning (ML) systems. Its components are focused on creating workflows aimed to build ML systems.
  • Cloud Composer provides the infraestructure to run Apache Airflow worflows. Its components are known as Airflow Operators and the workflows are connections between these operators that are known as DAGs.

Both services run on Kubernetes, but they are based on different programming frameworks; therefore, you are correct, Kuberflow deploys and monitors Machine Learning models. See below the answer for your questions:

  1. In that case, since Machine Learning models are also objects, can't we orchestrate them using Cloud Composer?

You would need to find an operator that meet your needs, or create a custom operator with the structure required to create a model, see this example. Even when it can be performed, this could be more difficult that using Kubeflow.

  1. How does Kubeflow help in any way, better than Cloud Composer when it comes to managing Machine Learning models??

Kubeflow hides complexity as it is focused on Machine Learninig models. The frameworks specialized on machine learning makes those things easier than using Cloud Composer which in this context can be considered as a general purpose tool (focused on linking existing services supported by the Airflow Operators).

Plasmagel answered 17/3, 2020 at 20:32 Comment(0)
K
7

Kubeflow and Kubeflow Pipelines

Kubeflow is not exactly the same as Kubeflow Pipelines. The Kubeflow project mostly develops Kubernetes operators for distributed ML training (TFJob, PyTorchJob). On the other hand the Pipelines project develops a system for authoring and running pipelines on Kubernetes. KFP also has some sample components, by the main product is the pipeline authoring SDK and the pipeline execution engine

Kubeflow Pipelines vs. Cloud Composer

The projects are pretty similar, but there are differences:

  • KFP use Argo for execution and orchestration. Cloud Composer uses Apache Airflow.
  • KFP/Argo is designed for distributed execution on Kubernetes. Cloud Composer/Apache Airflow are more for single-machine execution.
  • KFP/Argo are language-agnostic - components can use any language (components describe containerized command-line programs). Cloud Composer/Apache Airflow use Python (Airflow operators are defined as Python classes).
  • KFP/Argo have concept of data passing. Every component has inputs and outputs and pipleine connects them into a data passing graph. Cloud Composer/Apache Airflow do not really have data passing (Airflow has global variable storage and XCom, but it's not the same thing as explicit data passing) and the pipeline is a task dependency graph rather than mostly data dependency graph (KFP can also have task dependencies, but usually they're not needed).
  • KFP supports execution caching feature that skips execution of tasks that have already been executed before.
  • KFP records all artifacts produced by pipeline runs in ML Metadata database.
  • KFP has experimental adapter which allows using Airflow operators as components.
  • KFP has large fast-growing ecosystem of custom components.
Kerouac answered 20/6, 2020 at 1:57 Comment(0)
P
5

Taking this straight from kubeflow.org

The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Our goal is not to recreate other services, but to provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures. Anywhere you are running Kubernetes, you should be able to run Kubeflow.

And as you can see it is a suite made of many software that are useful in the life cycle of a ML model. It comes with tensorflow, jupiter, etc. Now the real deal, when it comes to Kubeflow is "easy deploy of a ML model at scale on a Kubernetis cluster".

However on GCP you already a ML suite in cloud, datalab, cloud build etc. So I don't know how much efficient will be sinning up a kubernetis cluster if you don't need the "portability" factor.

Cloud Composer is the real deal while taking about orchestration of a workflow. It is a "managed" version of Apache Airflow and it is ideal for any "simple" workflow that changes a lot, since you can change it via a visual UI and with python.

It is also ideal to automate infrastructure operations:

enter image description here

Politico answered 17/3, 2020 at 8:28 Comment(0)
P
5
  • Kubeflow is a platform for developing and deploying a machine learning (ML) systems. Its components are focused on creating workflows aimed to build ML systems.
  • Cloud Composer provides the infraestructure to run Apache Airflow worflows. Its components are known as Airflow Operators and the workflows are connections between these operators that are known as DAGs.

Both services run on Kubernetes, but they are based on different programming frameworks; therefore, you are correct, Kuberflow deploys and monitors Machine Learning models. See below the answer for your questions:

  1. In that case, since Machine Learning models are also objects, can't we orchestrate them using Cloud Composer?

You would need to find an operator that meet your needs, or create a custom operator with the structure required to create a model, see this example. Even when it can be performed, this could be more difficult that using Kubeflow.

  1. How does Kubeflow help in any way, better than Cloud Composer when it comes to managing Machine Learning models??

Kubeflow hides complexity as it is focused on Machine Learninig models. The frameworks specialized on machine learning makes those things easier than using Cloud Composer which in this context can be considered as a general purpose tool (focused on linking existing services supported by the Airflow Operators).

Plasmagel answered 17/3, 2020 at 20:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.