例如以下的檔案,我想要把標頭含有hypothetical protein的一整段都洗掉,該怎么做到呢?謝謝各位!
>WP_034790625.1 LysR family transcriptional regulator [Ensifer adhaerens]
MDLLSAMRSFRRVIELQSFNKAAEELGQSNASISKQVRQLEERLGAVLIVRTTRRMSLSENGRAYFSECCRLLDELDHLE
RTTSGEAGEINGRLRLNAPLSFGLTVLAPMLARFMTLHPQLKVDMTLDDHVLNVVSEGFDVSIRVRAALTDSSLIARRLG
>WP_034791832.1 hypothetical protein [Ensifer adhaerens]
MPNTGGNNESLANRLTAHGREPTDFPDDSPLGKGDLREPREPVPEFDEPDDQDDLDETEEIELESIFDPDRYDPDDDFPP
PG
>WP_034796082.1 crotonase [Ensifer adhaerens]
MSVTFVVEDRVASVTLNRPERMNAVDAATERELDAIWEEIEARDDISCVVLTGAGERAFCAGADLKGAEKTGLDYWTESR
EGVRAFQEKRAPVWRGR
>WP_034796116.1 hypothetical protein [Ensifer adhaerens]
MSGQSETKLKQLLQAVPPGFLVDTAWMARHAISRQSVSGYVKRGWLEPALTGLYRRPFSPDTNPDAVTGWKIPLLSAIWL
DMSAIDLGTGDRALTPGGRLHPAYRITIPDELMPNETPRGA
>WP_034796134.1 glycosyl hydrolase [Ensifer adhaerens]
MDPEEIARSMNGLLQTVSPERMEALLPSPMIQNHAAFLHLLSDGALACAWFGGTLEGKSDISIFASVLPKGATQWGPPQR
MADHLLEVRDLSVEFHTAVGVVKAVRNISYHLDRGETLAILGESGSGKSVSSSAIMNLIDMPPGRISSGEILLDGVDLLP
CLSQDGGKTFPVRLLIEDGPG
>WP_034796142.1 peptide ABC transporter substrate-binding protein [Ensifer adhaerens]
MKKLFVLSALMLSSALSPAFAGSGPIKIVLAEEADLLEPCMATRSNIGRVIMQNVSETLTELDVRSDKGVMPRLAEKWEQ
MADHLLEVRDLSVEFHTAVGVVKAVRNISYHLDRGETLAILGESGSGKSVSSSAIMNLIDMPPGRISSGEILLDGVDLLP
FKPTMATNGTLQLSEIKIK
>WP_034796160.1 ABC transporter ATP-binding protein [Ensifer adhaerens]
MADHLLEVRDLSVEFHTAVGVVKAVRNISYHLDRGETLAILGESGSGKSVSSSAIMNLIDMPPGRISSGEILLDGVDLLP
DFADHVMVMQKGNIVELGTVREVFDAPQQDYTRALLAAGLDPDPDVQAAHRAARLQRAS
>WP_034796309.1 hypothetical protein [Ensifer adhaerens]
MNTSLIADSFVSLAALGGLLVLIGVIRSFDAKSPLNRRFLFGLQVLAALMASRVLAWWTDLFIFKAATIITAGLVPLSTV
LLAEGLLRRHAPRNTKWIAAGGAATFFVLAFLPVSLAEPWRVALLFLYQLVTFALAGHMTVTRDRTSLSKAENQAVDRIA
>WP_034796322.1 hypothetical protein [Ensifer adhaerens]
MDNDPFHAGEQQLQSLFAVREQLAGSRAIQASLPPGFAGFLAELHYVVLAVPDREGRIWVTMVFGRPGFLSAPDAMRVRV
GTGEMVVMTGHAVLDGFDGRLRRSHEGMPMNGLVRFKPDLLMSRTALARP
uj5u.com熱心網友回復:
看你每段好像都是3行,可以用:sed -i '/hypothetical protein /,+3d' file.txt 洗掉匹配到hypothetical protein的行,以及其后兩行(共3行),d是sed子命令洗掉匹配到的行,-i是修改原檔案,不加-i,則只列印,不修改
或者你不確定每段是不是3行,你每段之間都是空行,所以直接匹配 hypothetical protein 到 空行 之間的行:
sed -i '/hypothetical protein /,/^$/d' file.txt
uj5u.com熱心網友回復:
實在是感謝! 實際上本來這是沒有空行的 然后也不一定是三行。不知道如果是這種情況 該怎么辦呢?uj5u.com熱心網友回復:
那就找規律,寫個腳本處理uj5u.com熱心網友回復:
或者沒有空行,你自己先給加上空行,最后再刪掉就好了,這樣處理會簡單一點>WP_034796322.1 hypothetical protein [Ensifer adhaerens] 類似這個是以>開頭的,
sed -i 's/^>/\n>/' file.txt 把所有>開頭的,替換成 換行符+>
sed -i '/hypothetical protein /,/^$/d' file.txt 然后用上面的方法去洗掉
sed -i 's/^$/d' file.txt 最后把加的空行,洗掉
這種方法稍微簡單點,不然寫腳本去根據行號處理,或者用if判斷 會比較麻煩
uj5u.com熱心網友回復:
感謝!問題已經解決了。通過加上一行來制造規律!謝謝轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/59227.html
標籤:應用程序開發區
