使用awk洗掉重復的行，使行與另一個檔案更接近-有解無憂

我有兩個檔案

$cat file1.txt
0105   20   20   95     50
0106   20   20   95     50
0110   20   20   88     60
0110   20   20   88     65
0115   20   20   82     70
0115   20   20   82     70
0115   20   20   82     75

如果你看到file1.txt，第1列有重復的值，分別是0110和0115。

所以我想僅根據第 5 列的值保留一行，這些值更接近參考檔案 (file2.txt) 中的相應值。此處closely 表示file2.txt 中相等或最接近的值。我不想更改 file1.txt 中的任何值，而只想選擇一行。

$cat file2.txt
0105   20   20   95     50
0106   20   20   95     50
0107   20   20   95     52
0110   20   20   88     65  34
0112   20   20   82     80  23
0113   20   20   82     85  32
0114   20   20   82     70  23
0115   20   20   82     72
0118   20   20   87     79
0120   20   20   83     79

因此，如果我們比較這兩個檔案，我們必須保留 0110 20 20 88 65，因為file1.txt 中的第5 列條目（即65）與參考檔案中的（即file2.txt 中的65）更接近，并洗掉其他重復行。同樣，我們必須保留0115 20 20 82 70因為70更接近72并洗掉其他兩行以0115

期望輸出：

0105   20   20   95     50
0106   20   20   95     50
0110   20   20   88     65
0115   20   20   82     70

我正在嘗試使用以下腳本，但沒有得到我想要的結果。

awk 'FNR==NR { a[$5]; next } $5 in a ' file1.txt file2.txt > test.txt
awk '{a[NR]=$1""$2} a[NR]!=a[NR-1]{print}' test.txt

我的fortran程式演算法是：

# check each entries in column-1 in file1.txt with next rows if they are same or not
i.e. for i=1,i   do  # Here i is ith row
       for j=1,j   do
if a[i,j] != a[i 1,j]; then print the whole row as it is,
else
# find the row b[i,j] in file2.txt starting with a[i,j]
# and compare the 5th column i.e. b[i,j 5] with all a[i,j 5] starting with a[i,j] in file1.txt 
# and take the differences to find closest one
e.g. if we have 3 rows starting with same entry, then 
we select the a[i,j] in which diff(b[i,j 5],a[i,j 5]) is minumum i=1,2,3

uj5u.com熱心網友回復：

awk 'BEGIN {
    while ((getline line < "file2.txt")>0) {
        split(line, f);
        file2[f[1]] = line;
    }
}
{
    if (!($1 in result)) result[$1] = $0;
    split(result[$1], a);
    split(file2[$1], f);
    if (abs(f[5]-$5) < abs(f[5]-a[5])) result[$1] = $0;
}
END {
    for (i in result) print result[i];
}
function abs(n) {
    return (n < 0 ? -n : n);
}' file1.txt | sort

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/347833.html

標籤：贝壳 awk

上一篇：Shell/Bash-發送cli命令直到回傳所需的值

下一篇：回圈遍歷bashscript中的檔案名-需要在每個回圈第n次迭代中更新檔案中的一行文本，使用n 1但回圈序列會導致問題