How can I use a pipe or redirect in a qsub command?
Asked Answered
J

3

6

There are some commands I'd like to run on a grid using qsub (SGE 8.1.3, CentOS 5.9) that need to use a pipe (|) or a redirect (>). For example, let's say I have to parallelize the command

echo 'hello world' > hello.txt

(Obviously a simplified example: in reality I might need to redirect the output of a program like bowtie directly to samtools). If I did:

qsub echo 'hello world' > hello.txt

the resulting content of hello.txt would look like

Your job 123454321 ("echo") has been submitted

Similarly if I used a pipe (echo "hello world" | myprogram), that message is all that would be passed to myprogram, not the actual stdout.

I'm aware I could write a small bash script that each contain the command with the pipe/redirect, and then do qsub ./myscript.sh. However, I'm trying to run many parallelized jobs at the same time using a script, so I'd have to write many such bash scripts each with a slightly different command. When scripting this solution can start to feel very hackish. An example of such a script in Python:

for i, (infile1, infile2, outfile) in enumerate(files):
    command = ("bowtie -S %s %s | " +
               "samtools view -bS - > %s\n") % (infile1, infile2, outfile)

    script = "job" + str(counter) + ".sh"
    open(script, "w").write(command)
    os.system("chmod 755 %s" % script)
    os.system("qsub -cwd ./%s" % script)

This is frustrating for a few reasons, among them that my program can't even delete the many jobXX.sh scripts afterwards to clean up after itself, since I don't know how long the job will be waiting in the queue, and the script has to be there when the job starts.

Is there a way to provide my full echo 'hello world' > hello.txt command to qsub without having to create another file containing the command?

Judiciary answered 19/8, 2013 at 20:12 Comment(6)
The redirections would work if those are interpreted by the shell, not python.Ard
@devnull: What do you mean? If I type qsub echo 'hello world' > hello.txt directly into the shell, never involving Python, I get the problem described above, where hello.txt contains the text Your job.... (I show Python code only incidentally to demonstrate what a hassle it is to get around).Judiciary
Not sure I totally understand the question, but you can do echo sleep 300 | qsub -o /foo -e /bar to send the standard out to /foo and the standard error to /barCollazo
@spuder: Running echo sleep 300 | qsub gives me the error qsub: command required for a binary job. As for sending the standard output or error to a file: that does work for redirecting to a file but not for piping to another process.Judiciary
You must be using a different version of qsub (open pbs?) That error does not show up in the source code of torque's version of qsub github.com/adaptivecomputing/torqueCollazo
@spuder: As mentioned in the question, my version of qsub is from SGE 8.1.3: don't know if that answers your question.Judiciary
J
7

You can do this by turning it into a bash -c command, which lets you put the | in a quoted statement:

 qsub bash -c "cmd <options> | cmd2 <options>"

As @spuder has noted in the comments, it seems that in other versions of qsub (not SGE 8.1.3, which I'm using), one can solve the problem with:

echo "cmd <options> | cmd2 <options>" | qsub

as well.

Judiciary answered 9/9, 2013 at 16:26 Comment(3)
Apparently the redirect > does not work with bash -c. Is there another way to write to a file?Tattle
@highBandWidth: For redirecting to a file you could always do -o outputfile.txt as an argument to qsubJudiciary
A few comments: 1. The bash -c method works for I but not for >. 2. The echo | qsub method works with both | and >. 3. The -o output.txt method also works but might not always be desirable (for example for gzipped output).Esquiline
B
3

Although my answer is a bit late I am adding it for any incoming viewers. To use a pipe/direct and submit that as a qsub job you need to do a couple of things. But first, using qsub at the end of a pipe like you're doing will only result in one job being sent to the queue (i.e. Your code will run serially rather than get parallelized).

  1. Run qsub with enabling binary mode since the default qsub behavior rather expects compiled code. For that you use the "-b y" flag to qsub and you'll avoid any errors of the sort "command required for a binary mode" or "script length does not match declared length".
  2. echo each call to qsub and then pipe that to shell.

Suppose you have a file params-query.txt which hold several bowtie commands and piped calls to samtools of the following form:

bowtie -q query -1 param1 -2 param2 ... | samtools ...

To send each query as a separate job first prepare your command line units from STDIN through xargs STDIN. Notice the quotes around the braces are important if you are submitting a command of piped parts. That way your entire query is treated a single unit.

cat params-query.txt | xargs -i echo qsub -b y -o output_log  -e error_log -N job_name \"{}\" | sh 

If that didn't work as expected then you're probably better off generating an intermediate output between bowtie and samtools before calling samtools to accept that intermediate output. You won't need to change the qsub call through xargs but the code in params-query.txt should look like:

bowtie -q query -o intermediate_query_out -1 param1 -2 param2 && samtools read_from_intermediate_query_out

This page has interesting qsub tricks you might like

Briolette answered 30/8, 2014 at 16:7 Comment(0)
D
0
grep http *.job | awk -F: '{print $1}' | sort -u | xargs -I {} qsub {}
Denicedenie answered 10/11, 2016 at 22:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.