我意識到有許多類似的問題......希望我能在這里得到更多的見解。
我需要將keywords.tsv 中的一個鍵與data.tsv 中的一個句子相匹配。如果關鍵字存在于句子中的任何位置,我想將兩者都列印到新檔案中。如果同一個句子中有兩個關鍵字,則應列印兩次。
關鍵字.tsv
color>color
colour>color
expiry>expiration
expiration>expiration
資料.tsv
something>more
What is the expiry date of your credit card?>more
The credit card colour is blue and the expiry date has passed.>more
This card has a current expiration date.>more
期望的結果:
expiration>What is the expiry date of your credit card?>more
expiration>The credit card colour is blue and the expiry date has passed.>more
color>The credit card colour is blue and the expiry date has passed.>more
expiration>This card has a current expiration date.>more
我已經嘗試了很多東西:
awk -F "\t" 'NR==FNR{a[$1]=$2; next}
{
split($1,b,",");
for (b2 in b) { if(b[b2] == a[$1]) {print a[$1], $0}
}
}
' keywords.tsv data.tsv
我似乎很難弄清楚如何從 file1 訪問陣列的值以及其他問題。幫助表示贊賞!
uj5u.com熱心網友回復:
我認為這>是一個制表符。
您的主要問題似乎與分隔符有關:您不想按逗號拆分,而是希望按空格序列拆分:
awk '
BEGIN {FS = OFS = "\t"}
NR == FNR {kw[$1] = $2; next}
{
n = split($1, words, /[[:blank:]] /)
for (i = 1; i <= n; i ) {
if (words[i] in kw) print kw[words[i]], $0
}
}
' keywords.tsv data.tsv
expiration What is the expiry date of your credit card? more
color The credit card colour is blue and the expiry date has passed. more
expiration The credit card colour is blue and the expiry date has passed. more
expiration This card has a current expiration date. more
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/525738.html
標籤:数组awk分裂
