我試圖弄清楚如何撰寫一個 .awk 腳本,該腳本將 .csv 檔案作為輸入并在沒有逗號且列對齊的情況下輸出它。到目前為止,我已經嘗試過:
{ printf "%-10s %s\n", $1, $2, $3 ,$4 }
但這僅輸出對齊的前兩個欄位中的資料。它在洗掉逗號分隔符方面做得很好,但是第四列的雙引號中有逗號,我想知道這是否會導致問題。非常感謝任何指導我對使用 awk 非常陌生。
示例輸入如下:
Name,Last Name,Gender,Pet
Kit,Rattenberie,Male,"Crake, african black"
Cliff,Lakes,Male,"Red phalarope"
Tirrell,Stables,Male,"Rhea, greater"
Cherry,William,Female,"Crow, house"
所需的輸出將類似于:
Name Last Name Gender Pet
Kit Rattenberie Male "Crake, african black"
Cliff Lakes Male "Red phalarope"
Tirrell Stables Male "Rhea, greater"
Cherry William Female "Crow, house"
對于 10 行的 .csv 檔案。提前致謝
uj5u.com熱心網友回復:
使用磨坊主我們可以使用命令列選項將輸入資料從 CSV 轉換為“漂亮列印”格式:
mlr --c2p cat ./input
Name Last Name Gender Pet
Kit Rattenberie Male Crake, african black
Cliff Lakes Male Red phalarope
Tirrell Stables Male Rhea, greater
Cherry William Female Crow, house
它雖然放棄了引號。該--barred選項也很好:
mlr --c2p --barred cat ./input
--------- ------------- -------- ----------------------
| Name | Last Name | Gender | Pet |
--------- ------------- -------- ----------------------
| Kit | Rattenberie | Male | Crake, african black |
| Cliff | Lakes | Male | Red phalarope |
| Tirrell | Stables | Male | Rhea, greater |
| Cherry | William | Female | Crow, house |
--------- ------------- -------- ----------------------
一種更加編程的 awk 技術:在讀取輸入檔案時跟蹤每列的最大寬度,然后在最后使用它來列印資料:這本質上是重新實作column -t
awk -v FPAT='"[^"]*"|[^,] ' '
{
for (i=1; i<=NF; i ) {
data[NR][i] = $i
if (length($i) > maxw[i]) maxw[i] = length($i)
}
}
END {
for (i=1; i<=NR; i ) {
for (j=1; j<=length(data[i]); j ) {
printf "%-*s ", maxw[j], data[i][j]
}
printf "\n"
}
}
' ./input
Name Last Name Gender Pet
Kit Rattenberie Male "Crake, african black"
Cliff Lakes Male "Red phalarope"
Tirrell Stables Male "Rhea, greater"
Cherry William Female "Crow, house"
uj5u.com熱心網友回復:
使用gnu-awk,您可以使用:
awk -v FPAT='"[^"]*"|[^,] ' '{
for (i=1; i<=NF; i) $i = sprintf("%-12s", $i)} 1' file
Name Last Name Gender Pet
Kit Rattenberie Male "Crake, african black"
Cliff Lakes Male "Red phalarope"
Tirrell Stables Male "Rhea, greater"
Cherry William Female "Crow, house"
或者,如果寬度完全不可預測,則使用此awk column解決方案:
awk -v FPAT='"[^"]*"|[^,] ' -v OFS=';' '{$1=$1} 1' file |
column -s';' -t
Name Last Name Gender Pet
Kit Rattenberie Male "Crake, african black"
Cliff Lakes Male "Red phalarope"
Tirrell Stables Male "Rhea, greater"
Cherry William Female "Crow, house"
如果要創建 awk 腳本,請使用:
cat col.awk
BEGIN {
FPAT="\"[^\"]*\"|[^,] "
OFS=";"
}
{$1 = $1}
1
將其用作:
awk -f col.awk file.csv | column -s';' -t
uj5u.com熱心網友回復:
使用腳本的一個awk想法*.awk(根據 OP 的評論),并awk確定每列的最大寬度:
$ cat script.awk
BEGIN { FPAT="\"[^\"]*\"|[^,] " } # instead of parsing on field delimiter (via FS) ... parse on field format via (FPAT)
{ for (i=1;i<=NF;i )
w[i]= length($i) > w[i] ? length($i) : w[i] # keep track of max width of each column
lines[FNR]=$0 # save entire line
}
END { for (i=1;i<=FNR;i ) { # loop through each saved line
n=patsplit(lines[i],a) # reparse based on FPAT, storing fields in array a[]
for (j=1;j<n;j ) # loop through array entries ...
printf "%-*s%s", w[j], a[j], OFS # printing to stdout
print a[n] # print last field plus "\n"
}
}
或者使用多維陣列來存盤輸入,從而消除patsplit()輸入資料的第二次決議(通過):
$ cat script.awk
BEGIN { FPAT="\"[^\"]*\"|[^,] " }
{ for (i=1;i<=NF;i ) {
w[i]= length($i) > w[i] ? length($i) : w[i]
fields[FNR][i]=$i
}
}
END { for (i=1;i<=FNR;i ) {
for (j=1;j<NF;j )
printf "%-*s%s", w[j], fields[i][j], OFS
print fields[i][NF]
}
}
筆記:
- 假設整個檔案可以放入記憶體(通過
awk/lines[]orawk/fields[][]陣列) - 需要
GNU awk和FPAT多維陣列支持
這兩個生成:
$ awk -f script.awk file
Name Last Name Gender Pet
Kit Rattenberie Male "Crake, african black"
Cliff Lakes Male "Red phalarope"
Tirrell Stables Male "Rhea, greater"
Cherry William Female "Crow, house"
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/515565.html
標籤:CSVawk格式化
上一篇:從每行中僅提取第一次出現的字串
