第一次使用 bash 練習,這需要很多時間......
我正在嘗試創建一個腳本,在該腳本中,在 sports.csv 上給出 2 個引數(身高、體重)會基于此回傳值和主要國籍的重合數。如果這還不夠,如果 2 個國家的優勢地位相等,那么echoid 最低的優勢地位。
我也不能使用 awk、grep、sed 或 csvkit。
這是csv標頭:
id,name,nationality,sex,date_of_birth,height,weight,sport,gold,silver,bronze,info
736041664,A Jesus Garcia,ESP,male,1969-10-17,1.72,64,athletics,0,0,0,
532037425,A Lam Shin,KOR,female,1986-09-23,1.68,56,fencing,0,0,0,
435962603,Aaron Brown,CAN,male,1992-05-27,1.98,79,athletics,0,0,1,
521041435,Aaron Cook,MDA,male,1991-01-02,1.83,80,taekwondo,0,0,0,
33922579,Aaron Gate,NZL,male,1990-11-26,1.81,71,cycling,0,0,0,
173071782,Aaron Royle,AUS,male,1990-01-26,1.80,67,triathlon,0,0,0,
266237702,Aaron Russell,USA,male,1993-06-04,2.05,98,volleyball,0,0,1,
到現在:
count=0
while IFS=, read -a id _ nation _ _ height weight _ _ _ _; do
if (( $height == "$2" )) && (( "$weight" == $3 )) ; then
((count ))
fi
done < athletes.csv
echo "$count"
我見過一個類似的問題。但是找不到回傳最常見國籍(字串)的方法。
尋找類似的東西:
Count, Predominant_nationality 1.85 130
8460, BRA
我應該嘗試使用陣列而不是嘗試使用 lopps 進行孔練習嗎?可能我可以做索引,但看起來陣列在這里是一維的?
任何幫助都是一種祝福
uj5u.com熱心網友回復:
這是一個排序和計數的問題,可以用 Linux 標準文本實用程式解決
csv='athletes.csv'
crit='1\.85,90'
echo "Count Predominant_nationality $crit"
# Get fields from csv and sort on filtered fields 2,3
cut -d ',' -f 1,3,6,7 "$csv" | grep "$crit" | sort -t ',' -k2,3 | tr ',' ' ' | \
# Count unique skipping first field, get first
uniq -f 1 -c | sort -n -k1,1nr -k2n | head -n1 | tr -s ' ' | \
# print result
cut -d ' ' -f 2,4 --output-delimiter=' '
結果
Count Predominant_nationality 1.85,90
2 BRA
uj5u.com熱心網友回復:
當前代碼的一些問題:
read -a說將值讀入陣列,但您真正想要的是將值讀入單個變數read -r在這種情況下很典型(-r禁用反斜杠作為轉義)- 構造通常用于整數比較,
if (( ... ))并且由于高度是非整數(例如,1.85),因此最好堅持使用字串比較(尤其是因為我們只對相等匹配感興趣)
設定; 而不是下載鏈接/資料檔案,我將在 OP 的示例輸入中添加 4x 假行,確保所有 4x 行都與 OP 的示例搜索引數(1.85和130)匹配:
$ cat athletes.csv
id,name,nationality,sex,date_of_birth,height,weight,sport,gold,silver,bronze,info
736041664,A Jesus Garcia,ESP,male,1969-10-17,1.72,64,athletics,0,0,0,
532037425,A Lam Shin,KOR,female,1986-09-23,1.68,56,fencing,0,0,0,
435962603,Aaron Brown,CAN,male,1992-05-27,1.98,79,athletics,0,0,1,
521041435,Aaron Cook,MDA,male,1991-01-02,1.83,80,taekwondo,0,0,0,
33922579,Aaron Gate,NZL,male,1990-11-26,1.81,71,cycling,0,0,0,
173071782,Aaron Royle,AUS,male,1990-01-26,1.80,67,triathlon,0,0,0,
266237702,Aaron Russell,USA,male,1993-06-04,2.05,98,volleyball,0,0,1,
134,Aaron XX1,USA,male,1993-06-04,1.85,130,volleyball,0,0,1,
127,Aaron XX2,CAD,male,1993-06-04,1.85,130,volleyball,0,0,1,
34,Aaron XX3,USA,male,1993-06-04,1.85,130,volleyball,0,0,1,
27,Aaron XX4,CAD,male,1993-06-04,1.85,130,volleyball,0,0,1,
一個bash想法:
arg1="1.85"
arg2="130"
maxid=99999999999
unset counts ids maxcount
declare -A counts ids
maxcount=0
while IFS=, read -r id _ nation _ _ height weight _
do
if [[ "${height}" == "${arg1}" && "${weight}" == "${arg2}" ]]
then
(( counts[${nation}] ))
# keep track of overall max count
[[ "${counts[${nation}]}" -gt "${maxcount}" ]] && maxcount="${counts[${nation}]}"
# keep track of min(id) for each nation
[[ "${id}" -lt "${ids[${nation}]:-${maxid}}" ]] && ids[${nation}]="${id}"
fi
done < athletes.csv
或者,由于看起來我們的搜索模式是在一起的,并且只能出現在一行中的一個位置,我們可以使用grep它來僅過濾掉匹配的行:
$ grep ",${arg1},${arg2}," athletes.csv
134,Aaron XX1,USA,male,1993-06-04,1.85,130,volleyball,0,0,1,
127,Aaron XX2,CAD,male,1993-06-04,1.85,130,volleyball,0,0,1,
34,Aaron XX3,USA,male,1993-06-04,1.85,130,volleyball,0,0,1,
27,Aaron XX4,CAD,male,1993-06-04,1.85,130,volleyball,0,0,1,
然后我們可以將此結果提供給while/read回圈并消除測驗height/weight變數的需要,例如:
while IFS=, read -r id _ nation _
do
(( counts[${nation}] ))
[[ "${counts[${nation}]}" -gt "${maxcount}" ]] && maxcount="${counts[${nation}]}"
[[ "${id}" -lt "${ids[${nation}]:-${maxid}}" ]] && ids[${nation}]="${id}"
done < <(grep ",${arg1},${arg2}," athletes.csv)
此時,這兩個while/read回圈都會產生:
$ typeset -p counts ids maxcount
declare -A counts=([USA]="2" [CAD]="2" )
declare -A ids=([USA]="34" [CAD]="27" )
declare -- maxcount="2"
從這里 OP 可以遍歷國家串列 ( "${!counts[@]}") 尋找等于的計數,maxcount然后在找到時應用額外的檢查來查看國家是否具有ids[]迄今為止在回圈中看到的最低 id ( )。在回圈結束時,OP 應該具有國家 a) 計數等于maxcount和 b) 具有最低 id。
uj5u.com熱心網友回復:
您可以嘗試rq(https://github.com/fuyuncat/rquery/releases)
counta(;1)進行計數,mina(;id)回傳最小 id,f height=@h and weight=@w過濾具有給定引數的記錄,e @2=@3 trim @1, @4, @h, @w匹配??最小 id 并顯示結果。
[ rquery]$ ./rq -n -v "h:1.85;w:130" -q "p d/,/\"\"/ | s counta(;1) ,mina(;id),id, nationality | f height=@h and weight=@w | e @2=@3 trim @1, @4, @h, @w " samples/athletes.csv
2 HON 1.85 130
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/525079.html
標籤:重击壳CSV
