我有一個像這樣的json檔案:
{
"foo": {
"bar1":
{"A1": {"name": "A1", "path": "/path/to/A1"},
"B1": {"name": "B1", "path": "/path/to/B1"},
"C1": {"name": "C1", "path": "/path/to/C1"},
"D1": {"name": "D1", "path": "/path/to/D1"}},
"bar2":
{"A2": {"name": "A2", "path": "/path/to/A2"},
"B2": {"name": "B2", "path": "/path/to/B2"},
"C2": {"name": "C2", "path": "/path/to/C2"},
"D2": {"name": "D2", "path": "/path/to/D2"}}}
}
我正在嘗試分別對樣本集'bar1'和'bar2'中的樣本運行我的snakemake管道,將結果放入它們自己的檔案夾中。當我擴展通配符時,我不想要樣本集和樣本的所有迭代,我只希望它們在它們的特定組中,如下所示:
tmp/bar1/A1.bam
tmp/bar1/B1.bam
tmp/bar1/C1.bam
tmp/bar1/D1.bam
tmp/bar2/A2.bam
tmp/bar2/B2.bam
tmp/bar2/C2.bam
tmp/bar2/D2.bam
希望我的蛇檔案能幫助解釋。我試過讓我的蛇檔案是這樣的:
sample_sets = [ i for i in config['foo'] ]
samples_dict = config['foo'] #cleans it up
def get_samples(wildcards):
return list(samples_dict[wildcards.sample_set].keys())
rule all:
input:
expand(expand("tmp/{{sample_set}}/{sample}.bam", sample = get_samples), sample_set = sample_sets),
這不起作用,我的檔案名以“<function get_samples at 0x7f6e00544320>”結尾!我也試過:
rule all:
input:
expand(expand("tmp/{{sample_set}}/{sample}.bam", sample = list(samples_dict["{{sample_set}}"].keys()), sample_set = sample_sets),
但那是一個KeyError。也試過這個:
rule all:
input:
[ ["tmp/{{sample_set}}/{sample}.aligned_bam.core.bam".format( sample = sample ) for sample in list(samples_dict[sample_set].keys())] for sample_set in sample_sets ]
它得到一個“輸入檔案中的通配符無法從輸出檔案中確定:'sample_set'”錯誤。
我覺得必須有一種簡單的方法可以做到這一點,也許我是個白癡。
任何幫助將不勝感激!如果我錯過了一些細節,請告訴我。
uj5u.com熱心網友回復:
可以在 expand中使用自定義組合函式。zip但是,在您的情況下,嵌套字典形狀通常需要設計自定義函式。相反,一個更簡單的解決方案是使用 Python 來構建所需檔案的串列。
d = {
"foo": {
"bar1": {
"A1": {"name": "A1", "path": "/path/to/A1"},
"B1": {"name": "B1", "path": "/path/to/B1"},
"C1": {"name": "C1", "path": "/path/to/C1"},
"D1": {"name": "D1", "path": "/path/to/D1"},
},
"bar2": {
"A2": {"name": "A2", "path": "/path/to/A2"},
"B2": {"name": "B2", "path": "/path/to/B2"},
"C2": {"name": "C2", "path": "/path/to/C2"},
"D2": {"name": "D2", "path": "/path/to/D2"},
},
}
}
list_files = []
for key in d["foo"]:
for nested_key in d["foo"][key]:
_tmp = f"tmp/{key}/{nested_key}.bam"
list_files.append(_tmp)
print(*list_files, sep="\n")
#tmp/bar1/A1.bam
#tmp/bar1/B1.bam
#tmp/bar1/C1.bam
#tmp/bar1/D1.bam
#tmp/bar2/A2.bam
#tmp/bar2/B2.bam
#tmp/bar2/C2.bam
#tmp/bar2/D2.bam
uj5u.com熱心網友回復:
@SultanOrazbayev 有權,但只是提出了幾個替代方案。
如果你喜歡回圈,pythonic 的撰寫方式是使用串列推導。如果您有巨大的檔案串列,您可能會注意到性能有所提高。
list_files = [
f"tmp/{key}/{nested_key}.bam"
for key in d["foo"]
for nested_key in d["foo"][key]
]
我能想到使用 expand 的唯一方法基本上是構建相同的串列。我將它作為字典傳遞,也保留通配符名稱,盡管元組會更有效。expand 的優點是如果您將檔案名放在配置變數中并且無法輕松格式化它,想要保留有意義的通配符名稱,或者將 allow_missing 用于其他通配符:
wcs = [{'sample_set': sample_set, 'sample': sample}
for sample_set in d["foo"]
for sample in d["foo"][sample_set]
]
list_files = expand("tmp/{sample_set}/{sample}.bam", zip,
sample_set=[wc['sample_set'] for wc in wcs],
sample=[wc['sample'] for wc in wcs],
)
有時snakemake方式不是pythonic!
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/431535.html
上一篇:使用元組作為鍵時字典有什么問題?
