PSET6在dna序列中查找strs的問題-有解無憂

我在以下部分遇到了麻煩：# 查找 DNA 序列中每個 STR 的最長匹配。

我不明白為什么當我 print(longest_str) 我得到的所有值都等于 0 {'AGATC': 0, 'AATG': 0, 'TATC': 0}

我是否錯誤地呼叫了longest_match函式？

PD：我是編程和python的新手，謝謝你的幫助！！

import csv
import sys   

def main():
    # TODO: Check for command-line usage
    longest_str = {}
    if len(sys.argv) != 3:
        sys.exit("Usage: python dna.py, data.csv, sequence.txt")

    # TODO: Read database file into a variable
    with open(sys.argv[1]) as f:
        data = csv.DictReader(f)

    # TODO: Read DNA sequence file into a variable
    with open(sys.argv[2]) as f2:
        dna_sequence = csv.DictReader(f2)

    # TODO: Find longest match of each STR in DNA sequence
    subsequences = data.fieldnames[1:]
    for subsequence in subsequences:
        longest_str[subsequence] = longest_match(str(dna_sequence), subsequence)
    print(longest_str)

# TODO: Check database for matching profiles

    return


def longest_match(sequence, subsequence):
    """Returns length of longest run of subsequence in sequence."""

    # Initialize variables
    longest_run = 0
    subsequence_length = len(subsequence)
    sequence_length = len(sequence)

    # Check each character in sequence for most consecutive runs of subsequence
    for i in range(sequence_length):

        # Initialize count of consecutive runs
        count = 0

        # Check for a subsequence match in a "substring" (a subset of characters) within sequence
        # If a match, move substring to next potential match in sequence
        # Continue moving substring and checking for matches until out of consecutive matches
        while True:

            # Adjust substring start and end
            start = i   count * subsequence_length
            end = start   subsequence_length

            # If there is a match in the substring
            if sequence[start:end] == subsequence:
                count  = 1

            # If there is no match in the substring
            else:
                break

        # Update most consecutive matches found
        longest_run = max(longest_run, count)

    # After checking for runs at each character in seqeuence, return longest run found
    return longest_run


main()

uj5u.com熱心網友回復：

dna 序列不是 csv 檔案。 dna_sequence = csv.DictReader(f2)

dna_sequence在這里是一個 dictreader 物件。cs50提供的longest_match功能不知道怎么處理。它需要一個字串。

uj5u.com熱心網友回復：

為了澄清@Fuelled_By_Coffee 所說的，csv.DictReader()回傳一個dictreader 物件。它用于遍歷 CSV 檔案中的行，為每行資料回傳一個字典。因此，data并且dna_sequence是 dictreader 物件，而不是每個檔案的內容。

dictreader 物件適合讀取 CSV 檔案。但是，您還沒有讀完那個檔案。在開始檢查 DNA 序列之前，您需要將 CSV 檔案中的所有資料讀入記憶體。我的建議：在你處理其余代碼之前，先讓這個作業。

關于 dna_sequence 資料，這些檔案不適合 dictreader。此物件需要一個帶有欄位名稱的標題行。要了解我的意思，請比較sequence\1.txtto的內容databases\small.csv。注意 CSV 有一個標題行，而序列檔案沒有？您需要不同的 Python 方法來讀取序列檔案。

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/412385.html

標籤：

上一篇：JavaScriptTypeError：說CSS宣告回傳為“null”

下一篇：如何撰寫帶有串列和數字X的函式。如果X存在于索引“m”的串列中，它應該從索引“m”回傳串列元素的總和