我有一個這樣的檔案,需要在不更改順序或格式的情況下洗掉每個單元格中的重復項
Sl.no Name1 Name2 Dis From Type item Animal Code
2 qw wsa 12 23 car,car Case CAT1,CAT1,Dog p.12>a,p.12>a
23 as swe 34 2,2 Bus,Bus Case1,, Dog1,Dog1,, N.12>a,N.12>a
23 ks awe 35 . Bike,Bike Case1,, rat4,rat4,, 5.16>b,5.16>b
缺失的資料記為 。(點)。
到目前為止,我已經嘗試過 awk
awk '{str="";c=0;split($0,arr,","); for (v in arr) c ; for (m=c;m >= 1;m--) for (n=1; n<m;n ) if (arr[m] == arr[n]) delete arr[m]; for (k=1;k<=c;k ) {if (k ==1 ) {s=arr[k] } else if (arr[k] != "") str=str" "arr[k] } print str}'
但它正在扼殺格式。有沒有其他方法可以做到這一點?
預期產出
Sl.no Name1 Name2 Dis From Type item Animal Code
2 qw wsa 12 23 car Case CAT1,Dog p.12>a
23 as swe 34 2 Bus Case1 Dog1 N.12>a
23 ks awe 35 . Bike Case1 rat4 5.16>b
uj5u.com熱心網友回復:
由于輸入看起來是固定寬度的,您可以使用unpack將其拆分為列。然后用逗號分割每個單元格并使用uniq洗掉重復項,同時保留順序。然后,用 . 輸出它pack。
use warnings;
use strict;
use List::Util qw(uniq);
my $tmpl = 'A6A6A7A5A6A10A8A15A*';
while (<DATA>) {
my @cols = unpack $tmpl, $_;
for my $c (@cols) {
$c =~ s/^\s //;
my @items = split /,/, $c;
$c = join ',', uniq(@items);
}
print pack($tmpl, @cols), "\n";
}
__DATA__
Sl.no Name1 Name2 Dis From Type item Animal Code
2 qw wsa 12 23 car,car Case CAT1,CAT1,Dog p.12>a,p.12>a
23 as swe 34 2,2 Bus,Bus Case1,, Dog1,Dog1,, N.12>a,N.12>a
23 ks awe 35 . Bike,Bike Case1,, rat4,rat4,, 5.16>b,5.16>b
輸出:
Sl.no Name1 Name2 Dis From Type item Animal Code
2 qw wsa 12 23 car Case CAT1,Dog p.12>a
23 as swe 34 2 Bus Case1 Dog1 N.12>a
23 ks awe 35 . Bike Case1 rat4 5.16>b
uj5u.com熱心網友回復:
和sed
$ sed -E 's/\t(.*),\1/\t\1/g;s/, \t/\t/g' file | column -ts$'\t'
Sl.no Name1 Name2 Dis From Type item Animal Code
2 qw wsa 12 23 car Case CAT1,Dog p.12>a
23 as swe 34 2 Bus Case1 Dog1 N.12>a
23 ks awe 35 . Bike Case1 rat4 5.16>b
uj5u.com熱心網友回復:
假設您的檔案是固定寬度的,而不是制表符分隔的,您可以使用正則運算式對欄位進行重復資料洗掉。匹配任何完整的非空白字串,以逗號分隔,對結果進行重復資料洗掉,然后用逗號將其連接回來。為洗掉的每個字符添加空格以修復格式。
use strict;
use warnings;
my $hdr = <DATA>;
print $hdr;
while (<DATA>) {
s/(\S )/ my %s; my $n = join ',', grep { !$s{$_} } split ',', $1; $n .= ' ' x (length($1) - length($n)); $n; /eg;
print;
}
__DATA__
Sl.no Name1 Name2 Dis From Type item Animal Code
2 qw wsa 12 23 car,car Case CAT1,CAT1,Dog p.12>a,p.12>a
23 as swe 34 2,2 Bus,Bus Case1,, Dog1,Dog1,, N.12>a,N.12>a
23 ks awe 35 . Bike,Bike Case1,, rat4,rat4,, 5.16>b,5.16>b
輸出:
Sl.no Name1 Name2 Dis From Type item Animal Code
2 qw wsa 12 23 car Case CAT1,Dog p.12>a
23 as swe 34 2 Bus Case1 Dog1 N.12>a
23 ks awe 35 . Bike Case1 rat4 5.16>b
uj5u.com熱心網友回復:
使用任何 POSIX awk:
$ cat tst.awk
NR==1 {
hdr = $0
while ( match(hdr,/[^[:space:]] [[:space:]] /) ) {
width[ i] = RLENGTH
hdr = substr(hdr,RSTART RLENGTH)
}
}
{
for ( i=1; i<=NF; i ) {
fld = ""
delete seen
n = split($i,parts,/,/)
for ( j=1; j<=n; j ) {
part = parts[j]
if ( (part != "") && !seen[part] ) {
fld = (fld == "" ? "" : fld ",") part
}
}
printf "%-*s", width[i], fld
}
print ""
}
$ awk -f tst.awk file
Sl.no Name1 Name2 Dis From Type item Animal Code
2 qw wsa 12 23 car Case CAT1,Dog p.12>a
23 as swe 34 2 Bus Case1 Dog1 N.12>a
23 ks awe 35 . Bike Case1 rat4 5.16>b
以上假設您真的不希望標題行中的“From”比其下方的資料值早 1 個字符開始,也不希望“代碼”在其他所有內容左對齊時右對齊。
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/425575.html
