How to run Airflow on Windows
Asked Answered
S

6

52

The usual instructions for running Airflow do not apply on a Windows environment:

# airflow needs a home, ~/airflow is the default,
# but you can lay foundation somewhere else if you prefer
# (optional)
export AIRFLOW_HOME=~/airflow

# install from pypi using pip
pip install airflow

# initialize the database
airflow initdb

# start the web server, default port is 8080
airflow webserver -p 8080

The Airflow utility is not available in the command line and I can't find it elsewhere to be manually added. How can Airflow run on Windows?

Sandasandakan answered 3/9, 2015 at 14:30 Comment(0)
C
5

You can activate bash in windows and follow the tutorial as is. I was able to get up and running successfully following above.

Once you are done installing, edit airflow.cfg to point all your configurations to somewhere in your windows system rather than lxss (ubuntu) since there are bugs around ubuntu not showing files written by windows system.

Characteristically answered 29/10, 2017 at 1:59 Comment(2)
Hi Ayush, this is not working anymore in current versions of airflow. It always fails with ModuleNotFoundError: No module named 'pwd', which is a module not available for Windows for technical reasons.Puccini
Running the commands with Git Bash gives me a ModuleNotFoundError: No module named 'termios'Condottiere
F
51

Three Basic Options

I went through a few iterations of this problem and documented them as I went along. The three things I tried were:

  1. Install Airflow directly into Windows 10 - This attempt failed.
  2. Install Airflow into Windows 10 WSL with Ubuntu - This worked great. Note that WSL is Windows Subsystem for Linux, which you can get for free in the Windows store.
  3. Install Airflow into Windows 10 via Docker + Centos - This worked great as well.

Note that if you want to get it running as a Linux service, it is not possible for option number 2. It is possible for option number 3, but I didn't do it as it requires activating privileged containers in docker (which I wan't aware of when I started). Also, running a service in Docker is kind of against paradigm as each container should be a single process/unit of responsibility anyway.

Detailed Description of #2 - WSL Option

If you're gong for option 2, the basic steps are:

  • Get WSL Ubuntu installed and opened up.
  • Verify it comes with python 3.6.5 or so (python3 -version).
  • Assuming it still does, add these packages so that installing PIP will work.
    • sudo apt-get install software-properties-common
    • sudo apt-add-repository universe
    • sudo apt-get update
  • Install pip with:
    • sudo apt-get install python-pip (or python3-pip for Python 3)
  • Run the following 2 commands to install airflow:
    • export SLUGIFY_USES_TEXT_UNIDECODE=yes
    • pip install apache-airflow (or pip3 for Python 3)
  • Open a new terminal (I was surprised, but this seemed to be required).
  • Init the airflow DB:
    • airflow initdb

After this, you should be good to go! The blog has more detail on many of these steps and rough timelines for how long setting up WSL takes, etc - so if you have a hard time dive in there some more.

Flatulent answered 27/12, 2018 at 3:18 Comment(6)
Hi John, you can work around the C++ compilation issues you fought on the direct installation under Windows by downloading prepared wheels for these dependencies, cf. lfd.uci.edu/~gohlke/pythonlibs, and then doing pip install <wheel file>Puccini
@Puccini Thanks did the good information! I had a suspicion it would eventually work, I just lost the will to push through in the end (given I’ll be running on Linux in prod anyway). I’m happy you found a real solution for people though; I’ll try to remember to link my blog to this comment so people find it :).Flatulent
Hi John, unfortunately, although one can work around the compilation issues, there is a Python package missing under Windows (native, cygwin and WSL1) which ultimately caused my efforts to fail (cf. my comment to the answer by @Ayush K Singh). I am now looking forward to Windows 10's WSL2 which should - in theory, since based on a real Linux kernel - be able to compile the pwd package.Puccini
For option 2, had to use https://mcmap.net/q/353985/-apache-airflow-airflow-initdb-results-in-quot-importerror-no-module-named-json-quot as the final step.Rebatement
@JohnHumphreys-w00te Thanks for your answer and I have also gone through your blog post for installing airflow. I have one thing to add here. If python3 is installed in ubuntu then it is recommended to install pip using sudo apt-get install python3-pip or else by default it would be installed in python 2.7Highroad
instead of "airflow initdb", in 2.2.2 version onwards need to use "airflow db init"Anglophile
O
23

I'm runnig airflow on windows 10 using docker.

1) First you need to install docker on your windows .

2) Run command docker version from command prompt if you get output means docker installed succesfuuly

2) Then you need to pull airflow image using command docker pull puckel/docker-airflow

3) Next step is to run image docker run -d -p 8080:8080 puckel/docker-airflow webserver

4) This will run airflow and you can access webui at localhost:8080

5) To copy dags use this command docker cp sample_dag.py containerName:/usr/local/airflow/dags

To access airflow utility you need to access the bash shell of container . you can do so using docker exec -it containerName bash . Once you inside bash shell you can run command line utilities ex **airflow list_dags**

Hope it helps

Overprize answered 17/12, 2019 at 5:24 Comment(3)
This is the best updated answer so far. You can also create a Dockerfile that COPY the workspace folder to the image.Dynamite
@Dynamite could you elaborate on your comment, please? I have struggled for days to try and get airflow running and this answer finally fixed it all for me. Now I'm wondering how to operationalise this and load all my dags in every time + keep the container running at all times. Any suggestions?Spendthrift
I tried docker cp sample_dag.py containerName:/usr/local/airflow/dags and changed the containxName to my container's name, but why can't copy local DAG to the container's dags/ folder? (I'm using Windows)Turne
S
16

Instead of installing Airflow via pip, download the zip on the Airflow project's GitHub, unzip it and in its folder, run python setup.py install on the command line. ERROR - 'module' object has no attribute 'SIGALRM' errors will happen, but so far this had no impact on Airflow's functions.

Using this method, the airflow util will not be available as a command. As a workaround, use the [current folder]\build\scripts-2.7\airflow file, which is the python script for the airflow util.

Another solution is to append to the System PATH variable a link to a batch file that runs airflow (airflow.bat):

python C:\path\to\airflow %*

From this point, the tutorial may be followed normally:

airflow init
airflow webserver -p 8080

I have not tested how well or if Airflow's DAGs run on Windows.

Sandasandakan answered 3/9, 2015 at 14:30 Comment(1)
This doesn't work anymore due to missing module "pwd" which is only available to UNIX systems. Referenced indirectly by airflow\bin\cli.py", line 16 and directly by daemon\daemon.py", line 25Lindly
J
6

Unfortunately, the answer to this seems to be "No" as of Dec 2015 - see https://github.com/airbnb/airflow/issues/709. This is because of the move to gunicorn. gunicorn may get windows support in R18.

Jobina answered 19/12, 2015 at 15:9 Comment(0)
T
6

You can do it using Cygwin. Cygwin is a command line shell that runs on Windows and emulates Linux. So you'll be able to run the commands,

# airflow needs a home, ~/airflow is the default,
# but you can lay foundation somewhere else if you prefer
# (optional)
export AIRFLOW_HOME=~/airflow

# install from pypi using pip
pip install apache-airflow

# initialize the database
airflow initdb

# start the web server, default port is 8080
airflow webserver -p 8080

Note 1: If you're running Cygwin on your company supplied computer you may need to run the Cygwin application as an administrator. You can do so with the following tutorial from Microsoft.

Note 2: If like me you are behind a proxy (at your work or whatever proxy you're behind) you'll need to set two enviornment variables for pip to work on the command line; in this case Cygwin. You can follow this StackOverflow answer for more details. So I set the following two environment variables on my Windows machine,

// Note this first entry has an S in HTTPS and the other entry is just regular HTTP. Don't forget that distinction in the key name and in the url of the value.
HTTPS_PROXY=https://myUsernameGoesHere:myPasswordGoesHere@yourProxyHostNameGoesHere:yourProxyPortNumberGoesHere

HTTP_PROXY=http://myUsernameGoesHere:myPasswordGoesHere@yourProxyHostNameGoesHere:yourProxyPortNumberGoesHere

No Longer Works: Apparently all of the above work was in vain because Airflow won't work on Windows. Please see this StackOverflow post. The above steps will allow you to use Pip though.

Alternatively, and I know this may or may not be seen as being run on Windows, you could install a virtual machine client such as Oracle's Virtualbox or VMware's Workstation and then setup whatever Linux version you want such as Ubuntu Desktop and then you can run Linux normally. If you need more detailed steps to do this you can follow this AskUbuntu from the Stack Exchange community answer here.

Alternatively (2), you could create an AWS account, then setup a simple ec2-instance running Linux, then ssh into that ec2-instance, and then run all your commands to your hearts content. AWS offers a free tier so you should be able to do it for free. Plus, AWS is very well documented so it shouldn't be too hard to get a simple Linux server up and running; I estimate a beginner could be done with it in about an hour.

Teeming answered 8/5, 2018 at 15:14 Comment(1)
I wonder how all of this changes with the recent updates to the Windows 10 Ubuntu Linux Sub-System.Teeming
C
5

You can activate bash in windows and follow the tutorial as is. I was able to get up and running successfully following above.

Once you are done installing, edit airflow.cfg to point all your configurations to somewhere in your windows system rather than lxss (ubuntu) since there are bugs around ubuntu not showing files written by windows system.

Characteristically answered 29/10, 2017 at 1:59 Comment(2)
Hi Ayush, this is not working anymore in current versions of airflow. It always fails with ModuleNotFoundError: No module named 'pwd', which is a module not available for Windows for technical reasons.Puccini
Running the commands with Git Bash gives me a ModuleNotFoundError: No module named 'termios'Condottiere

© 2022 - 2024 — McMap. All rights reserved.