Python：從具有約束的串列中采樣索引-有解無憂

我想知道如何從具有如下約束的串列中采樣索引：

               1111
     01234567890123
a = "ABAAABABBBABAB"
b = "ABABBB"

問題是如何找到a等于的有序采樣字符b

所以，問題的答案應該如下所示，

[0, 1, 2, 5, 7, 8],
[0, 1, 2, 5, 7, 9],
[0, 1, 2, 5, 7, 11],
...,
[2, 5, 6, 7, 8, 9],
...

因為b等于a[0] a[1] a[2] a[5] a[7] a[8]。[0, 1, 2, 5, 7, 8]問題的一種答案也是如此。但是答案的索引應該滿足有序條件。不可能[2, 1, 3, 5, 7, 8]。 a[2] a[1] a[3] a[5] a[7] a[8] 等于b，但不能滿足有序狀態條件。

的長度a是 22，并且的長度b總是小于的長度a。

我實作了蠻力來解決這個問題，但是它需要很大的時間復雜度。

有比蠻力更好的演算法嗎？如果有，我應該如何實施？

uj5u.com熱心網友回復：

如果我理解正確，這是一個全域序列比對問題，您可以使用BioPython解決：

from Bio import pairwise2

a = "ABAAABABBBABAB"
b = "ABABBB"

for alignment in pairwise2.align.globalxx(a, b):
    positions = [i for i, v in enumerate(alignment.seqB) if v != "-"]
    print(positions)

輸出 （部分）

[0, 1, 6, 7, 8, 13]
[0, 1, 4, 7, 8, 13]
[0, 1, 3, 7, 8, 13]
[0, 1, 2, 7, 8, 13]
[0, 1, 4, 5, 8, 13]
[0, 1, 3, 5, 8, 13]
[0, 1, 2, 5, 8, 13]
[0, 1, 4, 5, 7, 13]
[0, 1, 3, 5, 7, 13]
[0, 1, 2, 5, 7, 13]

uj5u.com熱心網友回復：

不能在恒定時間內完成，但可以用大致線性時間復雜度完成（至少，相對于A- 我認為其復雜度的上限是O(A * B)）如下：

def indices(A, B):
    # store partial patterns and complete patterns as we go
    # partial patterns will be stored in a list, where each element
    # is a 2-tuple (next_char_of_B_to_look_for, current_index_list)
    partial_patterns = []
    complete_patterns = []
    for (idx, char) in enumerate(A):
        # look for the next element, for any partial patterns we have
        # if we find it, then mark the index and increment the char to look for
        for p in partial_patterns:
            if p[0] < len(B) and char == B[p[0]]:
                p[1].append(idx)
                p[0]  = 1
                # p[0] == len(B) indicates a completed pattern, and also 
                # will stop this `if` from being triggered again for this 
                # now-completed pattern (which will be ignored hereafter)
                if p[0] == len(B):
                    complete_patterns.append(p[1])
        # always look for the first element of B to start a new partial pattern
        if char == B[0]:
            partial_patterns.append([1, [idx]])
    return complete_patterns

a = "ABAAABABBBABAB"
b = "ABABBB"
indices(a, b)
# [[0, 1, 2, 5, 7, 8], 
#  [2, 5, 6, 7, 8, 9], 
#  [3, 5, 6, 7, 8, 9], 
#  [4, 5, 6, 7, 8, 9]]

我相信這是給定輸入的正確輸出（因為索引 6 是最后一個A，B后面至少有三個s）

uj5u.com熱心網友回復：

建立一個類似于最長公共子序列問題的表，然后使用讀取所有 LCS部分提取所有 LCS （這里 LCS 是 B 序列）

uj5u.com熱心網友回復：

簡單的解決方案是：

a = "ABAAABABBBABAB"
b = "ABABBB"

total_res = []
def find_indices(index):
    b_index = 0
    res = []
    for c in range(index, len(a)):
        if b_index > len(b) - 1:
            break
        if a[c] == b[b_index]:
            res.append(c)
            b_index  = 1
    if res not in total_res and res:
        print(res)
        total_res.append(res)
for c in range(len(a)):
    find_indices(c)

輸出：

[0, 1, 2, 5, 7, 8]
[2, 5, 6, 7, 8, 9]
[3, 5, 6, 7, 8, 9]
[4, 5, 6, 7, 8, 9]
[6, 7, 10, 11, 13]
[10, 11, 12, 13]
[12, 13]

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/313557.html

標籤：Python 算法

上一篇：漸近分析logn和log(n m)中的時間復雜度

下一篇：中間有必填詞的字梯問題