Snakemake --forceall --dag results in mysterius Error: <stdin>: syntax error in line 1 near 'File' from Graphvis
Asked Answered
M

1

6

My attempts to construct DAG or rulegraph from RNA-seq pipeline using snakemake results in error message from graphviz. 'Error: : syntax error in line 1 near 'File'.

The error can be corrected by commenting out two print commands with no visible syntax errors. I have tried converting the scripts from UTF-8 to Ascii in Notepad++. Graphviz seems to have issues with these two specific print statements because there are other print statements within the pipeline scripts. Even though the error is easily corrected, it's still annoying because I would like colleagues to be able to construct these diagrams for their publications without hassle, and the print statements inform them of what is happening in the workflow. My pipeline consists of a snakefile and multiple rule files, as well as a config file. If the offending line is commented out in the Snakefile, then graphviz takes issue with another line in a rule script.

#######Snakefile
!/usr/bin/env Python
import os
import glob
import re
from os.path import join
import argparse
from collections import defaultdict
import fastq2json
from itertools import chain, combinations
import shutil
from shutil import copyfile
#Testing for sequence file extension
directory = "."
MainDir = os.path.abspath(directory) + "/"
## build the dictionary with full path for each for sequence files
fastq=glob.glob(MainDir+'*/*'+'R[12]'+'**fastq.gz')
if len(fastq) > 0 :
    print('Sequence file extensions have fastq')
    os.system('scripts/Move.sh')
    fastq2json.fastq_json(MainDir)
else :
    print('File extensions are good')
######Rule File
if not config["GroupdFile"]:
    os.system('Rscript scripts/Table.R')
    print('No GroupdFile provided')

snakemake --forceall --rulegraph | dot -Tpdf > dag.pdf should result in an pdf output showing the snakemake workflow, but if the two lines aren't commented out it results in Error: : syntax error in line 1 near

Morbid answered 16/8, 2019 at 20:45 Comment(0)
O
7

To understand what is going on take a close look at the command to generate your dag.pdf.

Try out the first part of your command:

snakemake --forceall --rulegraph

What does that do? It prints out the dag in text form.

By using a | symbol you 'pipe' (pass along) this print to the next part of your command:

dot -Tpdf > dag.pdf

And this part makes the actual pdf from the text that is 'piped' and stores in in dag.pdf. The problem is that when your snakefile makes print statements these prints also get 'piped' to the second half of your command, which interferes with the making of your dag.pdf.

A kinda hackish way how I solved the issue to be able to print, but also to be able to generate the dag is to use the logging functionality of snakemake. It is not a documented way, and a bit hackish, but works really well for me:

from snakemake.logging import logger

logger.info("your print statement here!")
Orchardman answered 16/8, 2019 at 21:11 Comment(4)
Thank you for that trick. I was thinking about trying os.system(echo), do you think that would have the same affect?Morbid
I don't know. Why don't you go ahead and try it? :)Orchardman
Thanks. So alternatively, printing to stderr works as well.Luzern
the error related to this is really cryptic! Thanks for the hint to remove print statements!Ascensive

© 2022 - 2024 — McMap. All rights reserved.