"{sample}.txt"
input
and output
cluases in rule definitionsrule calculate_gc:
input:
"data/{sample}.fasta"
output:
"results/{sample}_gc.txt"
shell:
"python scripts/gc_content.py {input} > {output}"
{sample}
is a wildcard that is defined in the rule definition.
rule all:
input:
"results/sampleA_gc.txt"
rule calculate_gc:
input:
"data/{sample}.fasta"
output:
"results/{sample}_gc.txt"
shell:
"python scripts/gc_content.py {input} > {output}"
Snakemake will run:
python scripts/gc_content.py data/sampleA_gc.txt > \
results/sampleA_gc.txt
rule all:
input:
"results/sampleA_gc.txt",
"results/sampleB_gc.txt",
"results/sampleC_gc.txt"
rule calculate_gc:
input:
"data/{sample}.fasta"
output:
"results/{sample}_gc.txt"
shell:
"python scripts/gc_content.py {input} > {output}"
Snakemake will run the rule three times, once for each sample in the input
of rule all
.
expand()
functionexpand()
that can be used to generate a list of files based on a patternexpand()
examplesamples = ["sampleA", "sampleB", "sampleC"]
expand("results/{sample}_gc.txt", sample=samples)
produces a list of strings:
["results/sampleA_gc.txt",
"results/sampleB_gc.txt",
"results/sampleC_gc.txt"]
samples = ["sampleA", "sampleB", "sampleC"]
rule all:
input:
expand("results/{sample}_gc.txt", sample=samples)
rule calculate_gc:
input:
"data/{sample}.fasta"
output:
"results/{sample}_gc.txt"
shell:
"python scripts/gc_content.py {input} > {output}"
$ ls
data scripts Snakefile
$ snakemake -j 3 # ← run all three jobs concurrently