顯示csv的行/列資料，最大值在另一列，同一行（bash）-有解無憂

我正在嘗試制作一個腳本，對第 2 列進行排序以獲得最高值，列印所述值，并為與該值匹配的每一行列印第 3 列。這是未排序的 csv 示例：

Argentina,4.6,2016,some data
Argentina,4.2,2018,some data
Argentina,4.6,1998,some data
Argentina,4.5,2001,some data

期望的輸出是：

4.6
2016
1998

到目前為止，這是我所得到的，但我不確定我是否正確地處理它：

grep  "$2*" "$1"> new.csv

sort -t, -k2,2nr new.csv > new2.csv

cut -f3 -d"," new2.csv

其中 $2 是第一列中的國家名稱，$1 是檔案名。雖然它對第二列中的值進行了很好的排序，但我想只顯示第 2 列中具有最大值的行的年份。這條路線只列印所有行的年份，我理解為什么會這樣，但不確定從那里獲得預期結果的最佳課程。有什么方法可以解決這個問題？提前致謝

uj5u.com熱心網友回復：

你可以這樣做：

declare maxvalue_found=no
declare maxvalue=''
while read -r line; do
  IFS=',' read -r <<< "$line" country value year data

  if [[ "${maxvalue_found}" == no ]]; then
    echo "$value"
    maxvalue="${value}"
    maxvalue_found=yes
  fi
  if [[ "${value}" == "${maxvalue}" ]]; then
    echo "$year"
  fi
done < new2.csv

new2.csv是您的排序檔案：我們只是逐行讀取它，然后通過使用“，”（https://www.gnu.org/software/bash/manual/bash.html#Word-Splitting）拆分來讀取所述行：

由于排序，第一個值應該是最高的。
必須測驗下一個值，因為您只需要匹配的值。
年份的列印順序與 new2.csv 相同

uj5u.com熱心網友回復：

假設：

逗號僅顯示為欄位分隔符（即，逗號不是任何資料的一部分）
沒有為最終結果定義排序要求

一個awk需要 2 次的想法通過未排序的檔案：

awk -F, '                                     # set input field delimiter as comma
FNR==NR { max=($2>max ? $2 : max); next}      # 1st pass of file (all rows): keep track of max value from field #2
FNR==1  { print max }                         # 2nd pass of file (1st row ): print max
$2==max { print $3 }                          # 2nd pass of file (all rows): if field #2 matches "max" then print field #3
' unsorted.csv unsorted.csv

這會產生：

4.6
2016
1998

另一個GNU awk需要單次通過未排序檔案的想法：

awk -F, '                                     # set input field delimiter as comma
    { arr[$2][$3]                             # save fields #2 and #3 as indices in array "arr[]"
      max = ( $2 > max ? $2 : max)            # keep track of max value from field #2
    }
END { print max                               # after file has been processed ... print max and then ...
      for (i in arr[max])                     # loop through indices of 2nd dimension where 1st dimension == max
          print i                             # print 2nd dimension index (ie, field #3)
    }
' unsorted.csv

這會產生：

4.6
1998
2016

筆記：

GNU awk需要陣列的陣列（即多維陣列）
雖然欄位 #3 似乎已排序，但除非我們修改代碼以顯式對陣列的第二維進行排序，否則不能保證這一點

uj5u.com熱心網友回復：

單通道awk而不是多通道怎么樣？我已經生成了這個檔案的合成版本，并隨機化了一些資料，以創建它的 624 萬行版本：

輸入

out9:  177MiB 0:00:01 [ 105MiB/s] [ 105MiB/s] [ <=> ]

rows = 6243584. | UTF8 chars = 186289540. | bytes = 186289540.

代碼

默認值初始化為巨大的負值 -2^512，或更優雅的-4^4^4，，*以確保它始終采用第 1 行的值

如果你真的想安全地玩，那么讓它非常接近* negative infinity：
```
 e.g. -(3 4 1)^341, -16^255, -256^127, or -1024^102
```

   {m,g}awk '

   BEGIN {
     1     _= -(_^= __= _ = _^= FS= OFS = ",")^_^_
     1   ___= split("",____)
    }

    # Rule(s)

6243584   _ <=  $__ { # 2992

  2992              __= $(NF =   __)
  2992      if (( _)<  $--NF) {
     7               _=  $NF
     7             ___= split("",____)
            }
  2992      ____[  ___]=__
  2992               __=NF
    }
    END {
     1      print _
  2984      for (__^=_<_; __<=___; __  ) {
  2984          print ____[__]
            }
    }

OUTPUT（第 3 列完全按輸入行順序列印）
。

53.6  1834  1999  1866  1938  1886  1973  1968  1921  1984  1957  1891  1864  1992
1998  1853  1950  1985  1962  2018  1897  1979  2020  1954  1995  1980  1900  1997
1856  1975  1851  1853  1988  1897  1973  1875  1917  1861  1912  1912  1954  1871
1952  1877  2003  1886  1863  1899  1897  1853  2013  1956  1965  1854  1873  1915
1983  1961  1965  1979  1919  1970  1946  1843  1856  1954  1965  1831  1926  1964

1994  1969  1831  1945  1942  1971  1988  1879  1998  1986  1844  1846  1994  1894
2008  1851  1877  1979  1970  1852  1942  1889  1986  2013  1905  1932  2021  1944
1866  1892  1940  1989  1907  1982  2016  1966  1975  1831  1851  2003  1980  1963
1869  1983  1972  2013  1972  1948  1843  1928  1959  1911  1844  1920  1943  1864
1985  1978  1855  1986  1975  1880  2001  1914  1877  1900  1964  1995  1992  1968

1868  1974  2012  1827  1849  1849  1992  1942  1884  1876  2021  1866  1977  1857
1866  1937  1920  1983  1915  1887  1890  1852  1871  1972  1903  1944  1943  1957
1844  1932  1854  1890  1891  1866  1923  1924  1941  1845  1907  2019  

(further rows truncated for readability)

uj5u.com熱心網友回復：

單程 awk：

$ awk -F, '{                    
    if($2>=m||m=="") {          
        b= ($2==m?b:$2) ORS $3   # b is the record buffer 
        m=$2                     # m holds the maximum of $2 so far
    } 
}
END {
    print b
}' file

輸出：

4.6
2016
1998

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/488995.html

標籤：重击壳排序

上一篇：對包含串列的dicts值的字典串列進行排序

下一篇：Pandas，多索引-使用鍵映射重新索引兩個索引級別并保持結構