這是上一個關于使用字典生成檔案串列以作為單個步驟的輸入的問題的后續問題Python。在這種情況下,我對合并單個樣本的 BAM 檔案感興趣,該樣本是通過映射來自多次運行的 FASTQ 檔案生成的。
我的規則中combine_bams僅針對單個樣本遇到錯誤:
InputFunctionException in line 116 of /oak/stanford/scg/lab_jandr/walter/tb/mtb/workflow/Snakefile:
Error:
WildcardError:
No values given for wildcard 'samp'.
Wildcards:
samp=10561-7352-culture_S24
mapper=bwa
ref=H37Rv
Traceback:
File "/oak/stanford/scg/lab_jandr/walter/tb/mtb/workflow/Snakefile", line 118, in <lambda>
似乎samp在通配符串列中定義正確,所以我不確定為什么會呼叫錯誤。任何建議都會很棒,我的snakemake檔案如下。謝謝!
# Define samples:
RUNS, SAMPLES = glob_wildcards(config['fastq_dir'] "{run}/{samp}_L001_R1_001.fastq.gz")
# Create sample dictionary so that each sample (key) has list of runs (values) associated with it.
sample_dict = {}
for key, val in zip(SAMPLES,RUNS):
sample_dict.setdefault(key, []).append(val)
#print(sample_dict)
# Constrain mapper and filter wildcards.
wildcard_constraints:
mapper="[a-zA-Z2] ",
filter="[a-zA-Z2] ",
run = '|'.join([re.escape(x) for x in RUNS]),
samp = '|'.join([re.escape(x) for x in SAMPLES]),
ref = '|'.join([re.escape(x) for x in config['ref']])
# Define a rule for running the complete pipeline.
rule all:
input:
trim = expand(['results/{samp}/{run}/trim/{samp}_trim_1.fq.gz'], zip, run = RUNS, samp = SAMPLES),
kraken=expand('results/{samp}/{run}/kraken/{samp}_trim_kr_1.fq.gz', zip, run = RUNS, samp = SAMPLES),
bams=expand('results/{samp}/{run}/bams/{samp}_{mapper}_{ref}_sorted.bam', zip, run = RUNS, samp = SAMPLES, ref = config['ref']*len(RUNS), mapper = config['mapper']*len(RUNS)), # When using zip, need to use vectors of equal lengths for all wildcards.
per_samp_run_stats = expand('results/{samp}/{run}/stats/{samp}_{mapper}_{ref}_combined_stats.csv', zip, run = RUNS, samp = SAMPLES, ref = config['ref']*len(RUNS), mapper = config['mapper']*len(RUNS)),
combined_bams=expand('results/{samp}/bams/{samp}_{mapper}_{ref}.merged.rmdup.bam', samp = np.unique(SAMPLES),ref=config['ref'], mapper=config['mapper'])
# Trim reads for quality.
rule trim_reads:
input:
p1='/labs/jandr/walter/tb/data/Stanford/{run}/{samp}_L001_R1_001.fastq.gz',
p2='/labs/jandr/walter/tb/data/Stanford/{run}/{samp}_L001_R2_001.fastq.gz'
output:
trim1='results/{samp}/{run}/trim/{samp}_trim_1.fq.gz',
trim2='results/{samp}/{run}/trim/{samp}_trim_2.fq.gz'
log:
'results/{samp}/{run}/trim/{samp}_trim_reads.log'
shell:
'workflow/scripts/trim_reads.sh {input.p1} {input.p2} {output.trim1} {output.trim2} &>> {log}'
# # Filter reads taxonomically with Kraken.
rule taxonomic_filter:
input:
trim1='results/{samp}/{run}/trim/{samp}_trim_1.fq.gz',
trim2='results/{samp}/{run}/trim/{samp}_trim_2.fq.gz'
output:
kr1='results/{samp}/{run}/kraken/{samp}_trim_kr_1.fq.gz',
kr2='results/{samp}/{run}/kraken/{samp}_trim_kr_2.fq.gz',
kraken_report='results/{samp}/{run}/kraken/{samp}_kraken.report',
kraken_stats = 'results/{samp}/{run}/kraken/{samp}_kraken_stats.csv'
log:
'results/{samp}/{run}/kraken/{samp}_kraken.log'
threads: 8
shell:
'workflow/scripts/run_kraken.sh {input.trim1} {input.trim2} {output.kr1} {output.kr2} {output.kraken_report} &>> {log}'
# Map reads.
rule map_reads:
input:
ref_path='/labs/jandr/walter/tb/data/refs/{ref}.fa',
kr1='results/{samp}/{run}/kraken/{samp}_trim_kr_1.fq.gz',
kr2='results/{samp}/{run}/kraken/{samp}_trim_kr_2.fq.gz'
output:
bam='results/{samp}/{run}/bams/{samp}_{mapper}_{ref}_sorted.bam'
params:
mapper='{mapper}'
log:
'results/{samp}/{run}/bams/{samp}_{mapper}_{ref}_map.log'
threads: 8
shell:
"workflow/scripts/map_reads.sh {input.ref_path} {params.mapper} {input.kr1} {input.kr2} {output.bam} &>> {log}"
# Combine reads and remove duplicates (per sample).
rule combine_bams:
input:
bams = lambda wildcards: expand('results/{samp}/{run}/bams/{samp}_{mapper}_{ref}_sorted.bam', run=sample_dict[wildcards.samp])
output:
combined_bam = 'results/{samp}/bams/{samp}_{mapper}_{ref}.merged.rmdup.bam'
log:
'results/{samp}/bams/{samp}_{mapper}_{ref}_merge_bams.log'
threads: 8
shell:
"sambamba markdup -r -p -t {threads} {input.bams} {output.combined_bam}"
uj5u.com熱心網友回復:
在 rule 中combine_bams,使用lambda運算式時,您需要提供所有{}通配符的值。目前只run提供資訊。解決此問題的一種方法是將 kwarg 包含allow_missing=True到expand:
bams = lambda wildcards: expand(
"results/{samp}/{run}/bams/{samp}_{mapper}_{ref}_sorted.bam",
run=sample_dict[wildcards.samp],
allow_missing=True,
)
這將與以下內容相同:
bams = lambda wildcards: expand(
"results/{samp}/{run}/bams/{samp}_{mapper}_{ref}_sorted.bam",
run=sample_dict[wildcards.samp],
samp="{samp}",
mapper="{mapper}",
ref="{ref}",
)
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/404583.html
標籤:
下一篇:更新字典中的值串列(累積更新)
