有沒有辦法在檔案中搜索單詞并保存所說的單詞和我知道開頭的另一個單詞？-有解無憂

我正在尋找一種方法來largefile.txt為queryfile.txt. 但是之后，我不想輸出/保存找到每個查詢詞的整行，而是只保存該查詢詞和我只知道開頭（例如“ABC”）并且我知道的第二個詞肯定是在同一行中找到第一個單詞。

例如，如果queryfile.txt有的話：

this
next

并largefile.txt有以下幾行：

this is the first line with an ABCword  # contents of first line will be saved
and there is an ABCword2 in this one as well  # contents of 2nd line will be saved
and the next line has an ABCword2 too  # contents of this line will be saved as well
third line has an ABCword3    # contents of this line won't

（請注意，在每一行中largefile.txt總是有一個以開頭開頭的單詞ABC。其中一個查詢單詞也不可能以“ABC”開頭）

保存檔案應類似于：

this ABCword1
this ABCword2
next ABCword2

到目前為止，我已經研究了其他類似帖子的建議，即結合 grep 和 awk，命令類似于：

LC_ALL=C grep -f queryfile.txt largefile.txt | awk -F"," '$2~/ABC/' > results.txt

問題是不僅沒有保存查詢詞，而且 -F"," '$2~/ABC/' 命令似乎也不是獲取以 'ABC' 開頭的單詞的正確命令。

我也找到了只使用 awk 的方法，但仍然沒有設法調整代碼來保存單詞 #2 而不是整行：

awk 'FNR==NR{A[$1]=$1;next} ($1 in A){print}' queryfile.txt largefile.txt > results.txt

uj5u.com熱心網友回復：

基于更新的樣本輸入/輸出的第二次嘗試：

$ cat tst.awk
FNR==NR { words[$1]; next }
{
    queryWord = otherWord = ""
    for (i=1; i<=NF; i  ) {
        if ( $i in words ) {
            queryWord = $i
        }
        else if ( $i ~ /^ABC/ ) {
            otherWord = $i
        }
    }
    if ( (queryWord != "") && (otherWord != "") ) {
        print queryWord, otherWord
    }
}

$ awk -f tst.awk queryfile.txt largefile.txt
this ABCword
next ABCword2

原答案：

這可能是你想要做的（未經測驗）：

awk '
    FNR==NR { word2lgth[$1] = length($1); next }
    ($1 in word2lgth) && (match(substr($0,word2lgth[$1] 1),/ ABC[[:alnum:]_] /) ) {
        print substr($0,1,word2lgth[$1] 1 RSTART RLENGTH)
    }
' queryfile.txt largefile.txt > results.txt

uj5u.com熱心網友回復：

鑒于：

cat large_file
this is the first line with an ABCword 
and the next line has an ABCword2 too CRABCAKE 
third line has an ABCword3 
ABCword4 and this is behind

cat query_file
this
next

（您在 large_file 的每一行上的注釋將被洗掉，否則 ABCword3 會列印，因為注釋中有“this”。）

您實際上可以完全使用 GNUsed和tr查詢檔案的操作來完成此操作：

pat=$(gsed -E 's/^(. )$/\\b\1\\b/' query_file | tr '\n' '|' | gsed 's/|$//')
gsed -nE "s/.*(${pat}).*(\<ABC[a-zA-Z0-9]*).*/\1 \2/p; s/.*(\<ABC[a-zA-Z0-9]*).*(${pat}).*/\1 \2/p" large_file

印刷：

this ABCword
next ABCword2
ABCword4 this

uj5u.com熱心網友回復：

這個假設您的查詢檔案的條目多于大檔案中一行的單詞數。此外，它不會將您的評論視為評論，而是將它們作為常規資料處理，因此如果剪切和粘貼，第三條記錄也是匹配的。

$ awk '
NR==FNR {                              # process queryfile
    a[$0]                              # hash those query words
    next
}
{                                      # process largefile
    for(i=1;i<=NF && !(f1 && f2);i  )  # iterate until both words found
        if(!f1 && ($i in a))           # f1 holds the matching query word
            f1=$i
        else if(!f2 && ($i~/^ABC/))    # f2 holds the ABC starting word 
            f2=$i
    if(f1 && f2)                       # if both were found
        print f1,f2                    # output them 
    f1=f2=""
}' queryfile largefile

uj5u.com熱心網友回復：

使用sed在一個while回圈

$ cat queryfile.txt
this
next


$ cat largefile.txt
this is the first line with an ABCword # contents of this line will be saved
and the next line has an ABCword2 too # contents of this line will be saved as well
third line has an ABCword3 # contents of this line won't

$ while read -r line; do sed -n "s/.*\($line\).*\(ABC[^ ]*\).*/\1 \2/p" largefile.txt; done < queryfile.txt
this ABCword
next ABCword2

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/364454.html

標籤：linux 猛击 awk 格雷普文本操作

上一篇：如何使用unix腳本/plsql將txt檔案匯入sqltabl

下一篇：如何通過POST請求將用戶重定向到外部站點