我需要重命名多個 fasta 檔案中的多個序列,我找到了這個腳本,以便為單個 ID 執行此操作:
original_file = "./original.fasta"
corrected_file = "./corrected.fasta"
with open(original_file) as original, open(corrected_file, 'w') as corrected:
records = SeqIO.parse(original_file, 'fasta')
for record in records:
print record.id
if record.id == 'foo':
record.id = 'bar'
record.description = 'bar' # <- Add this line
print record.id
SeqIO.write(record, corrected, 'fasta')
每個 fasta 檔案對應一個有機體,但在 ID 中沒有指定。我有原始的 fasta 檔案(因為這些檔案已被翻譯),檔案名相同但目錄不同,并且在它們的 ID 中包含每個生物體的名稱。我想弄清楚如何遍歷所有這些 fasta 檔案,并用相應的有機體名稱重命名每個檔案中的每個 ID。
uj5u.com熱心網友回復:
好的,我的努力,必須使用我自己的輸入檔案夾/檔案,因為它們沒有在問題中指定
/old檔案夾包含檔案:
MW628877.1.fasta:
>MW628877.1 Streptococcus agalactiae strain RYG82 DNA gyrase subunit A (gyrA) gene, complete cds
ATGCAAGATAAAAATTTAGTAGATGTTAATCTAACTAGTGAAATGAAAACGAGTTTTATCGATTACGCCA
TGAGTGTCATTGTTGCTCGTGCACTTCCAGATGTTAGAGATGGTTTAAAACCTGTTCATCGTCGTATTTT
>KY347969.1 Neisseria gonorrhoeae strain 1448 DNA gyrase subunit A (gyrA) gene, partial cds
CGGCGCGTACCGTACGCGATGCACGAGCTGAAAAATAACTGGAATGCCGCCTACAAAAAATCGGCGCGCA
TCGTCGGCGACGTCATCGGTAAATACCACCCCCACGGCGATTTCGCAGTTTACGGCACCATCGTCCGTAT
MG995190.1.fasta:
>MG995190.1 Mycobacterium tuberculosis strain UKR100 GyrA (gyrA) gene, complete cds
ATGACAGACACGACGTTGCCGCCTGACGACTCGCTCGACCGGATCGAACCGGTTGACATCCAGCAGGAGA
TGCAGCGCAGCTACATCGACTATGCGATGAGCGTGATCGTCGGCCGCGCGCTGCCGGAGGTGCGCGACGG
和一個/empty檔案夾。
/new檔案夾包含檔案:
MW628877.1.fasta:
>MW628877.1
MQDKNLVDVNLTSEMKTSFIDYAMSVIVARALPDVRDGLKPVHRRI
>KY347969.1
RRVPYAMHELKNNWNAAYKKSARIVGDVIGKYHPHGDFAVYGTIVR
MG995190.1.fasta:
>MG995190.1
MTDTTLPPDDSLDRIEPVDIQQEMQRSYIDYAMSVIVGRALPEVRD
我的代碼是:
from Bio import SeqIO
from os import scandir
old = './old'
new = './new'
old_ids_dict = {}
for filename in scandir(old):
if filename.is_file():
print(filename)
for seq_record in SeqIO.parse(filename, "fasta"):
old_ids_dict[seq_record.id] = ' '.join(seq_record.description.split(' ')[1:3])
print('_____________________')
print('old ids ---> ',old_ids_dict)
print('_____________________')
for filename in scandir(new):
if filename.is_file():
sequences = []
for seq_record in SeqIO.parse(filename, "fasta"):
if seq_record.id in old_ids_dict.keys():
print('@@@ ', seq_record.id,' ', old_ids_dict[seq_record.id])
seq_record.id = '.' old_ids_dict[seq_record.id]
seq_record.description = ''
print('-->', seq_record.id)
print(seq_record)
sequences.append(seq_record)
SeqIO.write(sequences, filename, 'fasta')
檢查它是如何作業的,它實際上覆寫了新檔案夾中的兩個檔案,
正如@Vovin 在他的評論中指出的那樣,它需要根據您的檔案模板從到到調整。
我相信有不止一種方法可以做到這一點,可能比可能的方式更好,更pythonic,我也在學習。讓我們知道
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/504639.html
