我有兩個檔案(都具有相同的 N=百萬行)
f1.txt :
1 J100079
2 J100180
3 J100228
4 J100291
5 J100333
6 J100537
7 J100549
8 J100757
9 J100953
10 J101030
和 f2.txt:
1 1 117656 0.494925
2 1 117656 0.0021814
2 2 117656 0.496289
3 1 117656 -0.00205095
3 2 117656 0.0024429
3 3 117656 0.495278
4 1 117656 -0.000898346
4 2 117656 -0.00520983
4 3 117656 -0.00694337
4 4 117656 0.495535
我希望創建一個f_final.txt檔案,該檔案已分配了從f1.txt到 的第 1 列和第 2 列的字符 ID,f2.txt而f2.txt.
我通過創建一個啟動的程序
f3.txt作為加盟f1.txt和f2.txt:join f1.txt f2.txt > f3.txt cat f3.txt 1 J100079 1 117656 0.494925 2 J100180 1 117656 0.0021814 2 J100180 2 117656 0.496289 3 J100228 1 117656 -0.00205095 3 J100228 2 117656 0.0024429 3 J100228 3 117656 0.495278 4 J100291 1 117656 -0.000898346 4 J100291 2 117656 -0.00520983 4 J100291 3 117656 -0.00694337 4 J100291 4 117656 0.495535從
f3.txt兩個新檔案中分離出來:f4.txt并f5.txt使用 cut (請注意,現在分隔符已從 join 變為 ' '):cut -d$' ' -f 1,2 f3.txt > f4.txt cut -d$' ' -f 3,5 f3.txt > f5.txt cat f4.txt 1 J100079 2 J100180 2 J100180 3 J100228 3 J100228 3 J100228 4 J100291 4 J100291 4 J100291 4 J100291 cat f5.txt 1 0.494925 1 0.0021814 2 0.496289 1 -0.00205095 2 0.0024429 3 0.495278 1 -0.000898346 2 -0.00520983 3 -0.00694337 4 0.495535f4.txtis good (no more changes to it). Forf5.txtI want to join it withf1.txtso I can assign tof5.txtnumerical ids, thef1.txtcharacter ids, but I do not want to change the order of the rows, so no sorting onf5.txtjoin f1.txt f5.txt > f6.txt join: f5.txt:7: is not sorted: 1 -0.000898346 join: f1.txt:10: is not sorted: 10 J101030step3 has an error. The last step would have been to column bind f4.txt and f6.txt with no changes in the order of rows.
paste -d" " f4.txt f6.txt > f_final.txt
The final output could have been like this:
1 J100079 1 J100079 0.494925
2 J100180 1 J100079 0.0021814
2 J100180 2 J100180 0.496289
3 J100228 1 J100079 -0.00205095
3 J100228 2 J100180 0.0024429
3 J100228 3 J100228 0.495278
4 J100291 1 J100079 -0.000898346
4 J100291 2 J100180 -0.00520983
4 J100291 3 J100228 -0.00694337
4 J100291 4 J100291 0.495535
Any suggestions greatly appreciated
uj5u.com熱心網友回復:
這與您的示例完全匹配:
join f1.txt f2.txt |
sort -k '3,3' |
join -o '2.1,2.2,2.3,1.2,2.5' -1 1 -2 3 f1.txt - |
sort -k 1,1 > final.txt
如果你需要最后一個sort取決于你。
uj5u.com熱心網友回復:
這是我使用 join 和 sort 的解決方案:
join f1.txt f2.txt > f3.txt
cat f3.txt
join -1 1 -2 3 -o'1.1,1.2,2.1,2.2,2.5' <(sort -k1 f1.txt) <(sort -k3 f3.txt) > f7.txt
cat f7.txt
sort -k1 -k3 < f7.txt > f8.txt
cat f8.txt
如果您有更短的方法可以節省數百萬行的計算時間,請發布解決方案。我肯定不會使用貓,因為我已經測驗過它可以與玩具示例一起使用。
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/354728.html
