我正在嘗試制作一個腳本,對第 2 列進行排序以獲得最高值,列印所述值,并為與該值匹配的每一行列印第 3 列。這是未排序的 csv 示例:
Argentina,4.6,2016,some data
Argentina,4.2,2018,some data
Argentina,4.6,1998,some data
Argentina,4.5,2001,some data
期望的輸出是:
4.6
2016
1998
到目前為止,這是我所得到的,但我不確定我是否正確地處理它:
grep "$2*" "$1"> new.csv
sort -t, -k2,2nr new.csv > new2.csv
cut -f3 -d"," new2.csv
其中 $2 是第一列中的國家名稱,$1 是檔案名。雖然它對第二列中的值進行了很好的排序,但我想只顯示第 2 列中具有最大值的行的年份。這條路線只列印所有行的年份,我理解為什么會這樣,但不確定從那里獲得預期結果的最佳課程。有什么方法可以解決這個問題?提前致謝
uj5u.com熱心網友回復:
你可以這樣做:
declare maxvalue_found=no
declare maxvalue=''
while read -r line; do
IFS=',' read -r <<< "$line" country value year data
if [[ "${maxvalue_found}" == no ]]; then
echo "$value"
maxvalue="${value}"
maxvalue_found=yes
fi
if [[ "${value}" == "${maxvalue}" ]]; then
echo "$year"
fi
done < new2.csv
new2.csv是您的排序檔案:我們只是逐行讀取它,然后通過使用“,”(https://www.gnu.org/software/bash/manual/bash.html#Word-Splitting)拆分來讀取所述行:
- 由于排序,第一個值應該是最高的。
- 必須測驗下一個值,因為您只需要匹配的值。
- 年份的列印順序與 new2.csv 相同
uj5u.com熱心網友回復:
假設:
- 逗號僅顯示為欄位分隔符(即,逗號不是任何資料的一部分)
- 沒有為最終結果定義排序要求
一個awk需要 2 次的想法通過未排序的檔案:
awk -F, ' # set input field delimiter as comma
FNR==NR { max=($2>max ? $2 : max); next} # 1st pass of file (all rows): keep track of max value from field #2
FNR==1 { print max } # 2nd pass of file (1st row ): print max
$2==max { print $3 } # 2nd pass of file (all rows): if field #2 matches "max" then print field #3
' unsorted.csv unsorted.csv
這會產生:
4.6
2016
1998
另一個GNU awk需要單次通過未排序檔案的想法:
awk -F, ' # set input field delimiter as comma
{ arr[$2][$3] # save fields #2 and #3 as indices in array "arr[]"
max = ( $2 > max ? $2 : max) # keep track of max value from field #2
}
END { print max # after file has been processed ... print max and then ...
for (i in arr[max]) # loop through indices of 2nd dimension where 1st dimension == max
print i # print 2nd dimension index (ie, field #3)
}
' unsorted.csv
這會產生:
4.6
1998
2016
筆記:
GNU awk需要陣列的陣列(即多維陣列)- 雖然欄位 #3 似乎已排序,但除非我們修改代碼以顯式對陣列的第二維進行排序,否則不能保證這一點
uj5u.com熱心網友回復:
單通道awk而不是多通道怎么樣?我已經生成了這個檔案的合成版本,并隨機化了一些資料,以創建它的 624 萬行版本:
輸入
out9: 177MiB 0:00:01 [ 105MiB/s] [ 105MiB/s] [ <=> ]
rows = 6243584. | UTF8 chars = 186289540. | bytes = 186289540.
代碼
默認值初始化為巨大的負值
-2^512,或更優雅的-4^4^4,,*以確保它始終采用第 1 行的值如果你真的想安全地玩,那么讓它非常接近*
negative infinity:e.g. -(3 4 1)^341, -16^255, -256^127, or -1024^102
=
{m,g}awk '
BEGIN {
1 _= -(_^= __= _ = _^= FS= OFS = ",")^_^_
1 ___= split("",____)
}
# Rule(s)
6243584 _ <= $__ { # 2992
2992 __= $(NF = __)
2992 if (( _)< $--NF) {
7 _= $NF
7 ___= split("",____)
}
2992 ____[ ___]=__
2992 __=NF
}
END {
1 print _
2984 for (__^=_<_; __<=___; __ ) {
2984 print ____[__]
}
}
OUTPUT(第 3 列完全按輸入行順序列印)
。
53.6 1834 1999 1866 1938 1886 1973 1968 1921 1984 1957 1891 1864 1992
1998 1853 1950 1985 1962 2018 1897 1979 2020 1954 1995 1980 1900 1997
1856 1975 1851 1853 1988 1897 1973 1875 1917 1861 1912 1912 1954 1871
1952 1877 2003 1886 1863 1899 1897 1853 2013 1956 1965 1854 1873 1915
1983 1961 1965 1979 1919 1970 1946 1843 1856 1954 1965 1831 1926 1964
1994 1969 1831 1945 1942 1971 1988 1879 1998 1986 1844 1846 1994 1894
2008 1851 1877 1979 1970 1852 1942 1889 1986 2013 1905 1932 2021 1944
1866 1892 1940 1989 1907 1982 2016 1966 1975 1831 1851 2003 1980 1963
1869 1983 1972 2013 1972 1948 1843 1928 1959 1911 1844 1920 1943 1864
1985 1978 1855 1986 1975 1880 2001 1914 1877 1900 1964 1995 1992 1968
1868 1974 2012 1827 1849 1849 1992 1942 1884 1876 2021 1866 1977 1857
1866 1937 1920 1983 1915 1887 1890 1852 1871 1972 1903 1944 1943 1957
1844 1932 1854 1890 1891 1866 1923 1924 1941 1845 1907 2019
(further rows truncated for readability)
uj5u.com熱心網友回復:
單程 awk:
$ awk -F, '{
if($2>=m||m=="") {
b= ($2==m?b:$2) ORS $3 # b is the record buffer
m=$2 # m holds the maximum of $2 so far
}
}
END {
print b
}' file
輸出:
4.6
2016
1998
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/488995.html
