使用forloop的結果在python中創建新串列-有解無憂

我創建了一個 mutate_v1 函式，可以在 DNA 序列中生成隨機突變。

def mutate_v1(sequence, mutation_rate):
    dna_list = list(sequence)
    for i in range(len(sequence)):
        r = random.random()
        if r < mutation_rate:
            mutation_site = random.randint(0, len(dna_list) - 1)
            dna_list[mutation_site] = random.choice(list('ATCG'))
        return ''.join(dna_list)

如果我將我的函式應用于所有元素，G0我會得到新一代 ( G1) 突變體（突變序列串列）。

G0 = ['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']

G1 = [mutate_v1(s,0.01) for s in G0]

#G1
['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']

如何將我的功能重復到 G20（20 代）？

我可以像下面這樣手動完成

G1   = [mutate_v1(s,0.01) for s in G0]
G2   = [mutate_v1(s,0.01) for s in G1]
G3   = [mutate_v1(s,0.01) for s in G2]
G4   = [mutate_v1(s,0.01) for s in G3]
G5   = [mutate_v1(s,0.01) for s in G4]
G6   = [mutate_v1(s,0.01) for s in G5]
G7   = [mutate_v1(s,0.01) for s in G6]

但我確信 for 回圈會更好。我已經測驗了幾個代碼，但沒有結果。

有人可以幫忙嗎？

uj5u.com熱心網友回復：

使用range迭代到代數，并將每一代存盤在一個串列中，每一代都是前一代變異的結果：

G0 = ['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']

generations = [G0]
for _ in range(20):
    previous_generation = generations[-1]
    generations.append([mutate_v1(s, 0.01) for s in previous_generation])

# then you can access by index to a generation
print(generations[1])  # access generation 1
print(generations[20]) # access generation 20

輸出

['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAT']

uj5u.com熱心網友回復：

Dani 的答案是一個很好的簡單解決方案，但我想演示另一種方法，使用 Python 中稍微更高級的編程技術，生成器函式：

def mutation_generator(g0):
    g = g0.copy()
    while True:
        yield g
        g = [mutate_v1(seq, 0.01) for seq in g]

現在，mutation_generator是一個無限序列生成器，這意味著理論上您可以無限期地繼續進化您的序列。如果要搶20代：

g0 = ['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
generation = mutation_generator(g0)
twenty_generations = [next(generation) for _ in range(20)]

這個生成器的好處是我們可以在任何時候從它停止的地方重新啟動它。假設您已經對前 20 代進行了一些分析，現在您想看看接下來的 100 代會發生什么：

next_hundred = [next(generation) for _ in range(100)]

現在，我們可以初始化一個新的生成器，使用 from 的上一代twenty_generations作為新生成器的初始代，但這不是必需的，因為我們的generation生成器只是在第 20 代時停止，并準備在您呼叫時繼續變異next(generation)。

這開辟了很多可能性，包括發送新的變異率引數，甚至，如果你愿意，甚至是全新的變異函式。真的，隨心所欲。

這里的另一個好處是您可以在相同的初始序列上運行多個生成器并觀察它們如何發散。請注意，這對于for在函式中使用回圈的更傳統方法是完全可能的，但是使用生成器的好處是您不必一次生成整個序列；只有當你告訴它 (via next())時它才會發生變異。例如：

g0 = ['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
universe_1 = mutation_generator(g0)
universe_2 = mutation_generator(g0)
universe_3 = mutation_generator(g0)

# The first generation is always the same as g0, but this can be modified if you desire
next(universe_1)
next(universe_2)
next(universe_3)

# Compare the first mutation without having to calculate twenty generations in each 'universe' before getting back results
first_mutation_u1 = next(universe_1)
first_mutation_u2 = next(universe_2)
first_mutation_u3 = next(universe_3)

同樣，您還可以修改生成器函式mutation_generator以接受其他引數，例如自定義變異函式，甚至可以隨時更改變異率等。

最后，作為旁注，使用生成器可以很容易地跳過數千代，而無需在記憶體中存盤多個序列：

g0 = ['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
generation = mutation_generator(g0)
for _ in range(10000):
    next(generation)

print(g0)  # first gen
print(next(generation))  # ten thousand generations later

輸出：

['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
['TTGGA', 'CTTCG', 'TGTGA', 'TAACA', 'CATCG']

With a for loop-based approach, you would've had to either create and store all 10000 generations (wasting a lot of memory), or modify the code in Dani's answer to behave more like a generator (but without the benefits!).

Real Python has a good article on generators if you want to learn more. And of course, check out the docs as well.

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/322693.html

標籤：Python 循环嵌套生物信息学 dna序列

上一篇：獲得陣列和陳述句的最小值？

下一篇：JavascriptlocalStoragefor回圈的問題