I'm using a python-dsl called snakemake that looks like this:
from bx.intervals.cluster import ClusterTree
from epipp.config import system_prefix, include_prefix, config, expression_matrix
config["name"] = "correlate_chip_regions_and_rna_seq"
bin_sizes = {"H3K4me3": 1000, "PolII": 200, "H3K27me3": 200}
rule all:
input:
expand("data/{bin_size}_{modification}.bed", zip,
bin_size=bin_sizes.values(), modification=bin_sizes.keys())
rule get_gene_expression:
input:
expression_matrix
output:
"data/expression/series.csv"
run:
expression_matrix = pd.read_table(input[0])
expression_series = expression_matrix.sum(1).sort_values(ascending=False)
expression_series.to_csv(output[0], sep=" ")
I'd like to run yapf on the stuff within run:
blocks.
Is it possible to get yapf to ignore the stuff that does not exist in python, like the rule
keywords and so on and only use it on specific portions of the file?
--lines
option. Maybe what you want to do could be achieved with the help of a first processing of your snakefile to determine which lines are to be skipped and which are to be processed? – Methylnaphthalenerule bla:
withfor bla in bla
, run yapf, then switch back. Rather hackish though. Perhaps I should ask the yapf developers for pointers. – Panto