我正在處理 GWAS 資料。需要幫忙。
我的資料如下所示:
IID,rs098083,kgp794789,rs09848309,kgp8300747,.....
63,CC,AG,GA,AA,.....
54,AT,CT,TT,AG,.....
12,TT,GA,AG,AA,.....
.
.
.
如上所述,我總共有 512 行和 200 萬列。
期望的輸出:
SNP,Genotyping
rs098083,{
"CC" : [ 1, 63, 6, 18, 33, ...],
"CT" : [ 2, 54, 6, 7, 8, ...],
"TT" : [ 4, 9, 12, 13, ...],
"AA" : [86, 124, 4, 19, ...],
"AT" : [8, 98, 34, 74, ....],
.
.
.
}
kgp794789,{
"CC" : [ 1, 63, 6, 18, 33, ...],
"CT" : [ 2, 5, 6, 7, 8, ...],
"TT" : [ 4, 9, 12, 13, ...],
"AA" : [86, 124, 4, 19, ...],
"AT" : [8, 98, 34, 74, ....],
.
.
.
}
rs09848309,{
"CC" : [ 1, 63, 6, 18, 3, ...],
"CT" : [ 2, 5, 6, 7, 8, ...],
"TT" : [ 4, 9, 24 13, ...],
"AA" : [86, 134, 4, 19, ...],
"AT" : [8, 48, 34, 44, ....],
.
.
.
如上所述,在旋轉之后,我應該有一個 200 萬行和 2 列的 JSON 檔案。該SNP行的列包含 SNP 的 ID。該genotyping列將包含一個 JSON BLOB。此 BLOB 將是一組鍵值對。鍵是一個特定的基因型(例如,CC、CT、TT、....),值是具有與鍵匹配的基因型的 IID 串列。
輸出格式為“嵌入 JSON 的 CSV”
uj5u.com熱心網友回復:
這是使用stedolan/jq的方法:
jq -Rrn '
[ inputs / "," ] | transpose | .[0][1:] as $h | .[1:][]
| .[1:] |= [reduce ([.,$h] | transpose[]) as $t ({}; .[$t[0]] = [$t[1]]) | @text]
| join(", ")
'
rs098083, {"CC":["63"],"AT":["54"],"TT":["12"]}
kgp794789, {"AG":["63"],"CT":["54"],"GA":["12"]}
rs09848309, {"GA":["63"],"TT":["54"],"AG":["12"]}
kgp8300747, {"AA":["63","12"],"AG":["54"]}
演示
添加tonumber是否應將 ID 編碼為 JSON 數字
jq -Rrn '
[ inputs / "," ] | transpose | (.[0][1:] | map(tonumber)) as $h | .[1:][]
| .[1:] |= [reduce ([.,$h] | transpose[]) as $t ({}; .[$t[0]] = [$t[1]]) | @text]
| join(", ")
'
rs098083, {"CC":[63],"AT":[54],"TT":[12]}
kgp794789, {"AG":[63],"CT":[54],"GA":[12]}
rs09848309, {"GA":[63],"TT":[54],"AG":[12]}
kgp8300747, {"AA":[63,12],"AG":[54]}
演示
如果您的最終目標是無論如何都有一個 JSON 表示,請省略格式化原始輸出,這樣的事情可能會:
jq -Rn '
[ inputs / "," ] | transpose | .[0][1:] as $h | reduce .[1:][] as $t (
{}; .[$t[0]] = reduce ([$t[1:],$h] | transpose[]) as $i (
{}; .[$i[0]] = [$i[1]]
)
)
'
{
"rs098083": { "CC": ["63"], "AT": ["54"], "TT": ["12"] },
"kgp794789": { "AG": ["63"], "CT": ["54"], "GA": ["12"] },
"rs09848309": { "GA": ["63"], "TT": ["54"], "AG": ["12"] },
"kgp8300747": { "AA": ["63", "12"], "AG": ["54"] }
}
演示(手動格式化,以便與以前的解決方案進行比較)
uj5u.com熱心網友回復:
這是澄清的要求
期望的輸出是什么?
我問這個是因為你提供的是一個包含無效 JSON 的無效 CSV,所以很難相信你真的想要這樣的東西。
JSON的問題:
- 這是無效的:
{ CC : [1,2] }
- 鍵需要雙引號:
{ "CC" : [1,2] }
CSV 的問題:
- 這是無效的:
rs0993,{
"CC": [1,2],
"CT": [3]
}
- 第二列包含換行符和/或逗號和/或雙引號,因此應使用 CSV 規則對其進行轉義:
rs0993,"{
""CC"": [1,2],
""CT"": [3]
}"
合理的期望輸出:
- 帶有嵌入 JSON 的 CSV:
SNP,Genotyping
rs098083,"{""CC"": [63], ""AT"": [54], ""TT"": [12]}"
kgp794789,"{""AG"": [63], ""CT"": [54], ""GA"": [12]}"
- 陣列的 JSON 陣列:
[
["SNP", "Genotyping"],
["rs098083", {"CC": [63], "AT": [54], "TT": [12]}],
["kgp794789", {"AG": [63], "CT": [54], "GA": [12]}]
]
- 物件的 JSON 陣列:
[
{"SNP": "rs098083", "Genotyping": {"CC": [63], "AT": [54], "TT": [12]}},
{"SNP": "kgp794789", "Genotyping": {"AG": [63], "CT": [54], "GA": [12]}}
]
考慮到這一點,請確保您提供的輸入和輸出確實是您擁有/想要的;如果不是這種情況,請編輯您的問題。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/465350.html
