給定以下腳本和資料集: 腳本:
while IFS=","
read v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13;
do if [ -z "$v12" ];
then echo "$v1,$v2,$v3,$v4,$v5,$v6,$v7,$v8,$v9,$v10,$v11,'unknown',$v13";
else echo "$v1, $v2,$v3,$v4,$v5,$v6,$v7,$v8,$v9,$v10,$v11,$v12,$v13";
fi;
done
>train3.csv
資料集:
PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S
6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S
8,0,3,"Palsson, Master. Gosta Leonard",male,2,3,1,349909,21.075,,S
9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27,0,2,347742,11.1333,,S
10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14,1,0,237736,30.0708,,C
我想匯出為名稱為“train3.csv”的 CSV 檔案,但我這樣做的方式不起作用,它不顯示所做的更改或保存為 CSV 檔案。
我該如何解決這個問題?
預期的結果是:
PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,'unknown',S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,'unknown',S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,'unknown',S
6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,'unknown',Q
7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S
8,0,3,"Palsson, Master. Gosta Leonard",male,2,3,1,349909,21.075,'unknown',S
9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27,0,2,347742,11.1333,'unknown',S
10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14,1,0,237736,30.0708,'unknown',C
還包括一個新的 CSV 檔案創建。
謝謝。
uj5u.com熱心網友回復:
不要為此使用 Bash。您的輸入 CSV 包含帶引號的字串。您可能無法保證帶引號的字串必須只包含一個逗號。如果它包含更少或更多的逗號,這將破壞您的代碼。
而是使用專用工具,它可以正確處理帶引號的字串。最容易使用的工具是帶有DBD::CSV模塊的 Perl 。以下命令將在 Debian 上安裝它。
sudo apt-get install libdbd-csv-perl
現在您可以使用 SQL 來修復您的 CSV 檔案。
#! /usr/bin/perl
use DBI;
$dbh = DBI->connect ("dbi:CSV:")
or die "Cannot connect: $DBI::errstr";
my $sth = $dbh->prepare ("UPDATE train3.csv SET cabin = ? WHERE cabin is null");
$sth->execute ("'unknown'");
$sth->finish;
$dbh->disconnect;
如果您不想學習 Perl,您可以從命令列使用該腳本作為現成的程式。將其保存csv.pl并使其可執行:
#! /usr/bin/perl
use DBI;
$dbh = DBI->connect ("dbi:CSV:")
or die "Cannot connect: $DBI::errstr";
my $sth = $dbh->prepare (shift);
$sth->execute (@ARGV);
$sth->finish;
$dbh->disconnect;
接下來,您可以只傳遞查詢及其引數:
./csv.pl 'UPDATE train3.csv SET cabin = ? WHERE cabin is null' \'unknown\'
密切關注參考。
uj5u.com熱心網友回復:
您的腳本不起作用,因為read不知道從哪里讀取并且重定向應該在done. 我還改進了帶有引數分配的腳本:${parameter:-word}將word在引數為空時使用。
while IFS="," read -r v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13; do
echo "$v1,$v2,$v3,$v4,$v5,$v6,$v7,$v8,$v9,$v10,$v11,${v12:-'unknown'},$v13"
done <dataset.csv >train3.csv
您可以while使用其他工具避免回圈
awk -F, -v unknown="'unknown'" 'BEGIN { OFS="," } !$12 {$12=unknown} 1' < dataset.csv >train3.csv
兩種解決方案都會被欄位 2 中的逗號混淆(這就是欄位 12 而不是 11 被更改的原因)。如果名稱不帶逗號,則會檢查錯誤的欄位。
當您知道這Embarked是一個沒有逗號的欄位時,您可以使用
awk -F, -v unknown="'unknown'" '
BEGIN { OFS="," }
!$(NF-1) {$(NF-1)=unknown}
1' < dataset.csv >train3.csv
但是,您應該使用真正理解 csv 格式的解決方案,例如@ceving 的答案。
uj5u.com熱心網友回復:
稍微修改你的代碼:
#!/bin/bash
datafile='dataset.txt'
outputfile='train3.csv'
>"$outputfile"
while IFS="," read -r v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13
do
if [[ -z "$v12" ]]
then
echo "$v1,$v2,$v3,$v4,$v5,$v6,$v7,$v8,$v9,$v10,$v11,'unknown',$v13" >>"$outputfile"
else
echo "$v1, $v2,$v3,$v4,$v5,$v6,$v7,$v8,$v9,$v10,$v11,$v12,$v13" >>"$outputfile"
fi
done < "$datafile"
從檔案中讀取資料的一個很好的參考是https://mywiki.wooledge.org/BashFAQ/001
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/393525.html
