我從phobius得到這個結果,如下所示
ID sp|Q92673|1-2157
FT SIGNAL 1 28
FT DOMAIN 1 11 N-REGION.
FT DOMAIN 12 22 H-REGION.
FT DOMAIN 23 28 C-REGION.
FT DOMAIN 29 2135 NON CYTOPLASMIC.
FT TRANSMEM 2136 2156
FT DOMAIN 2157 2157 CYTOPLASMIC.
//
---------------------------------------------------------------------
ID sp|Q5SSG8|25-479
FT DOMAIN 1 455 NON CYTOPLASMIC.
//
---------------------------------------------------------------------
ID sp|Q92854|22-734
FT DOMAIN 1 713 NON CYTOPLASMIC.
//
---------------------------------------------------------------------
ID sp|Q9Y5E9|27-686
FT DOMAIN 1 660 NON CYTOPLASMIC.
//
---------------------------------------------------------------------
ID sp|Q9Y6N8|55-613
FT DOMAIN 1 559 NON CYTOPLASMIC.
//
我希望在每行前面列印相應的 Uniprot ID,以\\.
這是我創建的 perl 片段
open (MYFILE, "result_phobius.txt" )||warn "Couldn't open file because $!"; #give input file name
open (FILE, ">output.txt"); #output file name
while (<MYFILE>)
{
if ($_=~/^ID (\S ?)\s/) #search accession number started by > and terminate at white space
{
$id=$1;
chomp ($id);
print FILE "$id\t"; #will print accession number in a colomn
}
if ($_=~/^FT /)
{
print FILE "$_";
}
}
這僅在第一行列印 ID,即,它在具有單個域的結果的情況下作業得很好,但如果有多個域則失敗。
例如
FT SIGNAL 1 28
FT DOMAIN 1 11 N-REGION.
FT DOMAIN 12 22 H-REGION.
FT DOMAIN 23 28 C-REGION.
FT DOMAIN 29 2135 NON CYTOPLASMIC.
FT TRANSMEM 2136 2156
FT DOMAIN 2157 2157 CYTOPLASMIC.
sp|Q5SSG8|25-479 FT DOMAIN 1 455 NON CYTOPLASMIC.
sp|Q92854|22-734 FT DOMAIN 1 713 NON CYTOPLASMIC.
sp|Q9Y5E9|27-686 FT DOMAIN 1 660 NON CYTOPLASMIC.
sp|Q9Y6N8|55-613 FT DOMAIN 1 559 NON CYTOPLASMIC.
sp|Q02763|23-748 FT DOMAIN 1 726 NON CYTOPLASMIC.
sp|Q14517|22-4181 FT DOMAIN 1 4160 NON CYTOPLASMIC.
sp|O75051|35-1237 FT DOMAIN 1 1203 NON CYTOPLASMIC.
tr|D3DPA4|1-145 FT DOMAIN 1 119 CYTOPLASMIC.
FT TRANSMEM 120 144
FT DOMAIN 145 145 NON CYTOPLASMIC.
我怎樣才能使它適用于多個條目。
預期產出
sp|Q92673|1-2157 FT SIGNAL 1 28
sp|Q92673|1-2157 FT DOMAIN 1 11 N-REGION.
sp|Q92673|1-2157 FT DOMAIN 12 22 H-REGION.
sp|Q92673|1-2157 FT DOMAIN 23 28 C-REGION.
sp|Q92673|1-2157 FT DOMAIN 29 2135 NON CYTOPLASMIC.
sp|Q92673|1-2157 FT TRANSMEM 2136 2156
sp|Q92673|1-2157 FT DOMAIN 2157 2157 CYTOPLASMIC.
sp|Q5SSG8|25-479 FT DOMAIN 1 455 NON CYTOPLASMIC.
sp|Q92854|22-734 FT DOMAIN 1 713 NON CYTOPLASMIC.
sp|Q9Y5E9|27-686 FT DOMAIN 1 660 NON CYTOPLASMIC.
sp|Q9Y6N8|55-613 FT DOMAIN 1 559 NON CYTOPLASMIC.
sp|Q02763|23-748 FT DOMAIN 1 726 NON CYTOPLASMIC.
sp|Q14517|22-4181 FT DOMAIN 1 4160 NON CYTOPLASMIC.
sp|O75051|35-1237 FT DOMAIN 1 1203 NON CYTOPLASMIC.
tr|D3DPA4|1-145 FT DOMAIN 1 119 CYTOPLASMIC.
tr|D3DPA4|1-145 FT TRANSMEM 120 144
tr|D3DPA4|1-145 FT DOMAIN 145 145 NON CYTOPLASMIC.
我在這里先向您的幫助表示感謝
uj5u.com熱心網友回復:
只需將 移動print FILE "$id\t"到另一個if塊中,即僅在指定時填充 $id,為每個域列印它。
您可能會在列印之前添加一個 $id 不為空的檢查,但如果我正確理解格式,則不會發生這種情況。
if (/^ID (\S ?)\s/)
{
$id = $1;
}
if (/^FT /)
{
print FILE "$id\t$_";
}
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/325973.html
下一篇:將字串拆分為陣列會導致空陣列
