下面的代碼在多個檔案(以 snp_search.txt 結尾的檔案的 $file 變數)中搜索一組模式(包含在 $snps 變數中)并輸出一個長串列,說明每個 snp 是否在每個檔案中。
目的是找到所有檔案中的幾個 SNP 。
有沒有辦法將下面的代碼嵌入到 while 回圈中,以便它繼續運行,直到它找到所有檔案中的 SNP 并在它找到時中斷?否則我必須手動檢查日志檔案。
for snp in $snplist; do
for file in *snp_search.txt; do
if grep -wq "$snp" $file; then
echo "${snp} was found in $file" >> ${date}_snp_search.log; else
echo "${snp} was NOT found in $file" >> ${date}_snp_search.log
fi
done
done
uj5u.com熱心網友回復:
您可以使用grep搜索所有檔案。如果檔案名不包含換行符,您可以直接計算匹配檔案的數量:
#! /bin/bash
files=(*snp_search.txt)
count_files=${#files[@]}
for snp in $snplist ; do
count=$(grep -wl "$snp" *snp_search.txt | wc -l)
if ((count == count_files)) ; then
break
fi
done
對于包含換行符的檔案名,您可以為每個 $snp 輸出不帶檔案名的第一個匹配行并計算行數:
count=$(grep -m1 -hw "$snp" *snp_search.txt | wc -l)
uj5u.com熱心網友回復:
假設:
- 輸入檔案的一行中可能存在多個 SNP
- 將列印所有檔案中存在的所有SNP的串列(OP 提到了矛盾的陳述:vs )
find several SNPs that are in all of the filesbreak when one SNP is found in all files
示例輸入(如果 OP 使用示例資料更新問題,則會更新):
$ cat snp.dat
ABC
DEF
XYZZ
$ cat 1.snp.search.txt
ABCD-XABC
someABC_stuff
ABC-
de-ABC-
de-ABC
DEFG
zDEFG
.DEF-xyz
abc-DEF
abc-DEF-ABC-xyz
$ cat 2.snp.search.txt
ABC
一個GNU awk需要單次遍歷每個輸入檔案的想法:
awk '
FNR==NR { snps[$1]=0; next } # load 1st file into array; initialize counter (of files containing this snp) to 0
FNR==1 { filecount # 1st line of 2nd-nth files: increment counter of number of filds
delete to_find # delete our to_find[] array
for (snp in snps) # make a copy of our master snps[] array ...
to_find[snp] # storing copy in to_find[] array
}
{ for (snp in to_find) { # loop through list of snps
if ($0 ~ "\\y" snp "\\y") { # if current line contains a "word" match on the current snp ...
snps[snp] # increment our snp counter (ie, number of files containing this snp)
delete to_find[snp] # no longer need to search current file for this particular snp
# break # if line can only contain 1 snp then uncomment this line
}
}
for (snp in to_find) # if we still have an snp to find then ...
next # skip to next line else ...
nextfile # skip to next file
}
END { PROCINFO["sorted_in"]="@ind_str_asc"
for (snp in snps)
if (snps[snp] == filecount)
printf "The SNP %s was found in all files\n", snp
}
' snp.dat *.snp.search.txt
筆記:
GNU awkis required for thePROCINFO["sorted_in"]="@ind_str_asc"option to sort thesnps[]array indices; ifGNU awkis not available, or ordering of output messages is not important, then this command can be removed from the code- since we only process each input file once we will print all SNPs that show up in all files (ie, we won't know if a SNP exists in all files until we've processed the last file so might as well print all SNPs that exist in all fiels)
- should be faster than processes that require multiple scans of each input file (especially for larger files and/or a large number of SNPs)
This generates:
The SNP ABC was found in all files
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/345340.html
上一篇:如何在while回圈之外使用變數
下一篇:遍歷串列串列,維護串列結構
