Bash Operator error: No such file or directory in airflow
Asked Answered
C

6

15

I am a newbie to Airflow and struggling with BashOperator. I want to access a shell script using bash operatory in my dag.py.

I checked: How to run bash script file in Airflow and BashOperator doen't run bash file apache airflow

on how to access shell script through bash operator.

This is what I did:

 cmd = "./myfirstdag/dag/lib/script.sh "

        t_1 = BashOperator(
            task_id='start',
            bash_command=cmd
        )

On running my recipe and checking in airflow I got the below error:

[2018-11-01 10:44:05,078] {bash_operator.py:77} INFO - /tmp/airflowtmp7VmPci/startUDmFWW: line 1: ./myfirstdag/dag/lib/script.sh: No such file or directory
[2018-11-01 10:44:05,082] {bash_operator.py:80} INFO - Command exited with return code 127
[2018-11-01 10:44:05,083] {models.py:1361} ERROR - Bash command failed

Not sure why this is happening. Any help would be appreciated.

Thanks !

EDIT NOTE: I assume that it's searching in some airflow tmp location rather than the path I provided. But how do I make it search for the right path.

Currish answered 1/11, 2018 at 10:54 Comment(6)
Is ./myfirstdag/dag/lib/script.sh relative to the $AIRFLOW_HOME/dags directory?Pitiful
@Pitiful no it's not. /myfirstdag/dag/lib/ is a different path while $AIRFLOW_HOME gives a different path when I tried.Currish
what is then the absolute path to script.sh?Pitiful
@Pitiful this /home/notebook/work/myfirstdag/dag/lib/ . I tried giving this too. It throws the same error.Currish
Apparently, it's searching in a tmp directory that it's creating. That's what I understood from the source code. github.com/apache/incubator-airflow/blob/… . Not sure how to make it search in the path I gave.Currish
@Currish you ever get a clear answer on this? None of the 4 answers below are accepted, or have many upvotesVladimir
S
10

Try this:

bash_operator = BashOperator(
    task_id = 'task',
    bash_command = '${AIRFLOW_HOME}/myfirstdag/dag/lib/script.sh '
    dag = your_dag)
Sustentacular answered 20/1, 2021 at 13:41 Comment(2)
Hi there and thanks for answering this question! Could you please provide some details how this code solves the problem? :)Authorship
This solved my problem! If I'd change the bash_command to incorporate the extra space in the end to accomodate the jinja error, I'd always get a script not found error. Using the env variable solved it for good, thanks!Demott
H
4

For those running a docker version.

I had this same issue, took me a while to realise the problem, the behaviour can be different with docker. When the DAG is run it moves it tmp file, if you do not have airflow on docker this is on the same machine. with my the docker version it moves it to another container to run, which of course when it is run would not have the script file on.

check the task logs carefully, you show see this happen before the task is run. This may also depend on your airflow-docker setup.

Henriettehenriha answered 22/5, 2019 at 19:27 Comment(1)
i am using docker version only. How did you solve the problem?Pension
R
2

Try the following. It needs to have a full file path to your bash file.

cmd = "/home/notebook/work/myfirstdag/dag/lib/script.sh "

t_1 = BashOperator(
    task_id='start',
    bash_command=cmd
)
Rivero answered 2/11, 2018 at 14:9 Comment(2)
have tried. Doesn't work. Please refer to the comments above.Currish
Can you for the sake of debugging, try removing the .sh extension and the run with cmd=bash /home/notebook/work/myfirstdag/dag/lib/script and let me know it it works or not.Rivero
E
0

Are you sure of the path you defined?

cmd = "./myfirstdag/dag/lib/script.sh "

With the heading . it means it is relative to the path where you execute your command.

Could you try this?

cmd = "find . -type f"
Elviselvish answered 1/11, 2018 at 12:35 Comment(10)
I tried without the . too. It gives the same error.Currish
Try setting cmd="pwd" and gives us output of a find - ls in the corresponding directory?Epigenous
I tried that already. It gives a tmp directory which gets deleted after every run. But I am not sure how do I set the path to the one where my script is present.Currish
Ok, so put the find -ls as the cmd itself, and we will see where is located the myfirstdag/dag/lib/script.shEpigenous
Seems like it cannot locate that using the find command. It's because it cannot find the directory itself. So I did cd /home ; ls -lrt. All I got is just one folder as airflow/ where as I have two other folders in it named example/ and notebook/ which isn't showing when I am doing it through the bashOperator. All I get is this: Running command: cd / ; cd home/; ls Output: airflowCurrish
You misunderstood me, I've updated my answer consequently for the command you should try.Epigenous
This is what I got : Running command: find -ls Output: 22273487 4 drwx------ 2 root root 4096 Nov 5 08:46 . 22273488 4 -rw------- 1 root root 8 Nov 5 08:46 ./startRIKEG9Currish
Ok, so it's 100% sure, there is no myfirstdag/dag/lib/script.sh in a subdirectory of this temporary directory. So you can't either use the relative path ./myfirstdag/dag/lib/script.sh , or you priorly need to perform action to copy missing files.Epigenous
yep. That's what I am looking for. Either how to copy the file to this location or access the path I gave.Currish
Where is located your script.sh file on your computer? In addition, maybe you should add dag=dag definition in your bash operator, See airflow.apache.org/tutorial.html#it-s-a-dag-definition-fileEpigenous
N
0

try running this:

path = "/home/notebook/work/myfirstdag/dag/lib/script.sh"
copy_script_cmd = 'cp ' + path + ' .;'
execute_cmd = './script.sh'

t_1 = BashOperator(
    task_id='start',
    bash_command=copy_script_cmd + execute_cmd
)
Nonconformist answered 9/10, 2019 at 0:11 Comment(0)
F
0

I'm just gonna comment here with a similar issue.

I was getting an error when using the BashOperator where it couldn't find a package which was definitely installed in the requirements.txt :-

INFO - /usr/bin/bash: line 1: my_package: command not found

My task was the below:-

test_task = BashOperator(
            task_id="test_task",
            env= {"MYVAR": 'some string'},
            bash_command='my_package --version',
            dag=dag,
        )

Turns out it was the use of env that was causing issues, as when I took this out MWAA could 'see' my_package again. No idea why it's set up like this, but I ended up scrapping the use of env vars and just using the provided Jinja templates to get what I need into the bash script.

Hope this helps anyone else facing the similar strange issue.

Flawed answered 12/7 at 17:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.