How airflow can pick up dags from a dag folder in a git branch using git sync
Asked Answered
J

2

8

My company uses git-sync to sync zipped dags to airflow. We use airflow helm charts to deploy airflow. I wonder if I can let airflow only pick up zipped dags in a specific folder such as dags-dev in a git branch, not all the zipped dags?

Here are some reference might be useful.

The airflow helm chart value file. https://github.com/helm/charts/blob/master/stable/airflow/values.yaml

Our dags code looks like this:

dags:
      doNotPickle: true
      git:
        url: <git url>
        ref: master
        gitSync:
          enabled: true
          image:
            repository: <some repo>
            tag: 1.0.7
          refreshTime: 60
      initContainer:
        enabled: true
        image:
          repository: <some repo>
          tag: 1.0.7

Airflow git sync configuration looks like this:

AIRFLOW__KUBERNETES__DAGS_VOLUME_SUBPATH: repo # must match AIRFLOW__KUBERNETES__GIT_SUBPATH
AIRFLOW__KUBERNETES__GIT_REPO: <git repo>
AIRFLOW__KUBERNETES__GIT_BRANCH: master
AIRFLOW__KUBERNETES__GIT_DAGS_FOLDER_MOUNT_POINT: /opt/airflow/dags
AIRFLOW__KUBERNETES__GIT_USER: <some user>
AIRFLOW__KUBERNETES__GIT_PASSWORD: <some password>
AIRFLOW__KUBERNETES__GIT_SYNC_CONTAINER_REPOSITORY: gitlab.beno.ai:4567/eng/external-images/k8s.gcr.io/git-sync
AIRFLOW__KUBERNETES__GIT_SYNC_CONTAINER_TAG: v3.1.1
Jilolo answered 27/8, 2020 at 4:28 Comment(1)
Did you figure this out?Farthingale
P
0

You can define list of folders/files to be ignored, with .airflowignore file

https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#airflowignore

Pernik answered 10/5, 2021 at 10:54 Comment(0)
V
0

Seems like this implementation does not support git subpath, plus if you look behind subpath method, there is a git clone followed by the directory filtration. As a new feature of git for partial clone git-sparse-checkout is still experimental.

Hence one solution can be to utilize dags-path to point to the sub-directory in repository.

###################################
# Airflow - DAGs Configs
###################################
dags:
  ## the airflow dags folder
  ##
  path: /opt/airflow/dags/repo/dir

Note: I recommend shifting from this to any other maintained implementation of airflow for production workloads, as it is archived now and will no longer be patched.

Here is a sample of your required option from bitnami/airflow

# bitnami airflow helm values.yaml reference
repositories:
  - repository: https://gitlab.com/repo.git
    ## Branch from repository to checkout
    ##
    branch: "master"
    ## An unique identifier for repository, must be unique for each repository
    ##
    name: "dags"
    ## Path to a folder in the repository containing the dags
    ##
    path: ""
Valadez answered 1/3, 2023 at 8:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.