我想知道如何從具有如下約束的串列中采樣索引:
1111
01234567890123
a = "ABAAABABBBABAB"
b = "ABABBB"
問題是如何找到a等于的有序采樣字符b
所以,問題的答案應該如下所示,
[0, 1, 2, 5, 7, 8],
[0, 1, 2, 5, 7, 9],
[0, 1, 2, 5, 7, 11],
...,
[2, 5, 6, 7, 8, 9],
...
因為b等于a[0] a[1] a[2] a[5] a[7] a[8]。[0, 1, 2, 5, 7, 8]問題的一種答案也是如此。但是答案的索引應該滿足有序條件。不可能[2, 1, 3, 5, 7, 8]。
a[2] a[1] a[3] a[5] a[7] a[8] 等于b,但不能滿足有序狀態條件。
的長度a是 22,并且 的長度b總是小于 的長度a。
我實作了蠻力來解決這個問題,但是它需要很大的時間復雜度。
有比蠻力更好的演算法嗎?如果有,我應該如何實施?
uj5u.com熱心網友回復:
如果我理解正確,這是一個全域序列比對問題,您可以使用BioPython解決:
from Bio import pairwise2
a = "ABAAABABBBABAB"
b = "ABABBB"
for alignment in pairwise2.align.globalxx(a, b):
positions = [i for i, v in enumerate(alignment.seqB) if v != "-"]
print(positions)
輸出 (部分)
[0, 1, 6, 7, 8, 13]
[0, 1, 4, 7, 8, 13]
[0, 1, 3, 7, 8, 13]
[0, 1, 2, 7, 8, 13]
[0, 1, 4, 5, 8, 13]
[0, 1, 3, 5, 8, 13]
[0, 1, 2, 5, 8, 13]
[0, 1, 4, 5, 7, 13]
[0, 1, 3, 5, 7, 13]
[0, 1, 2, 5, 7, 13]
uj5u.com熱心網友回復:
不能在恒定時間內完成,但可以用大致線性時間復雜度完成(至少,相對于A- 我認為其復雜度的上限是O(A * B))如下:
def indices(A, B):
# store partial patterns and complete patterns as we go
# partial patterns will be stored in a list, where each element
# is a 2-tuple (next_char_of_B_to_look_for, current_index_list)
partial_patterns = []
complete_patterns = []
for (idx, char) in enumerate(A):
# look for the next element, for any partial patterns we have
# if we find it, then mark the index and increment the char to look for
for p in partial_patterns:
if p[0] < len(B) and char == B[p[0]]:
p[1].append(idx)
p[0] = 1
# p[0] == len(B) indicates a completed pattern, and also
# will stop this `if` from being triggered again for this
# now-completed pattern (which will be ignored hereafter)
if p[0] == len(B):
complete_patterns.append(p[1])
# always look for the first element of B to start a new partial pattern
if char == B[0]:
partial_patterns.append([1, [idx]])
return complete_patterns
a = "ABAAABABBBABAB"
b = "ABABBB"
indices(a, b)
# [[0, 1, 2, 5, 7, 8],
# [2, 5, 6, 7, 8, 9],
# [3, 5, 6, 7, 8, 9],
# [4, 5, 6, 7, 8, 9]]
我相信這是給定輸入的正確輸出(因為索引 6 是最后一個A,B后面至少有三個s)
uj5u.com熱心網友回復:
建立一個類似于最長公共子序列問題的表,然后使用讀取所有 LCS部分提取所有 LCS (這里 LCS 是 B 序列)
uj5u.com熱心網友回復:
簡單的解決方案是:
a = "ABAAABABBBABAB"
b = "ABABBB"
total_res = []
def find_indices(index):
b_index = 0
res = []
for c in range(index, len(a)):
if b_index > len(b) - 1:
break
if a[c] == b[b_index]:
res.append(c)
b_index = 1
if res not in total_res and res:
print(res)
total_res.append(res)
for c in range(len(a)):
find_indices(c)
輸出:
[0, 1, 2, 5, 7, 8]
[2, 5, 6, 7, 8, 9]
[3, 5, 6, 7, 8, 9]
[4, 5, 6, 7, 8, 9]
[6, 7, 10, 11, 13]
[10, 11, 12, 13]
[12, 13]
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/313557.html
下一篇:中間有必填詞的字梯問題
