我在以下部分遇到了麻煩:# 查找 DNA 序列中每個 STR 的最長匹配。
我不明白為什么當我 print(longest_str) 我得到的所有值都等于 0 {'AGATC': 0, 'AATG': 0, 'TATC': 0}
我是否錯誤地呼叫了longest_match函式?
PD:我是編程和python的新手,謝謝你的幫助!!
import csv
import sys
def main():
# TODO: Check for command-line usage
longest_str = {}
if len(sys.argv) != 3:
sys.exit("Usage: python dna.py, data.csv, sequence.txt")
# TODO: Read database file into a variable
with open(sys.argv[1]) as f:
data = csv.DictReader(f)
# TODO: Read DNA sequence file into a variable
with open(sys.argv[2]) as f2:
dna_sequence = csv.DictReader(f2)
# TODO: Find longest match of each STR in DNA sequence
subsequences = data.fieldnames[1:]
for subsequence in subsequences:
longest_str[subsequence] = longest_match(str(dna_sequence), subsequence)
print(longest_str)
# TODO: Check database for matching profiles
return
def longest_match(sequence, subsequence):
"""Returns length of longest run of subsequence in sequence."""
# Initialize variables
longest_run = 0
subsequence_length = len(subsequence)
sequence_length = len(sequence)
# Check each character in sequence for most consecutive runs of subsequence
for i in range(sequence_length):
# Initialize count of consecutive runs
count = 0
# Check for a subsequence match in a "substring" (a subset of characters) within sequence
# If a match, move substring to next potential match in sequence
# Continue moving substring and checking for matches until out of consecutive matches
while True:
# Adjust substring start and end
start = i count * subsequence_length
end = start subsequence_length
# If there is a match in the substring
if sequence[start:end] == subsequence:
count = 1
# If there is no match in the substring
else:
break
# Update most consecutive matches found
longest_run = max(longest_run, count)
# After checking for runs at each character in seqeuence, return longest run found
return longest_run
main()
uj5u.com熱心網友回復:
dna 序列不是 csv 檔案。 dna_sequence = csv.DictReader(f2)
dna_sequence在這里是一個 dictreader 物件。cs50提供的longest_match功能不知道怎么處理。它需要一個字串。
uj5u.com熱心網友回復:
為了澄清@Fuelled_By_Coffee 所說的,csv.DictReader()回傳一個dictreader 物件。它用于遍歷 CSV 檔案中的行,為每行資料回傳一個字典。因此,data并且dna_sequence是 dictreader 物件,而不是每個檔案的內容。
dictreader 物件適合讀取 CSV 檔案。但是,您還沒有讀完那個檔案。在開始檢查 DNA 序列之前,您需要將 CSV 檔案中的所有資料讀入記憶體。我的建議:在你處理其余代碼之前,先讓這個作業。
關于 dna_sequence 資料,這些檔案不適合 dictreader。此物件需要一個帶有欄位名稱的標題行。要了解我的意思,請比較sequence\1.txtto的內容databases\small.csv。注意 CSV 有一個標題行,而序列檔案沒有?您需要不同的 Python 方法來讀取序列檔案。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/412385.html
標籤:
