比較后匹配字串的開始和結束索引-有解無憂

我正在嘗試創建兩個包含字串的“開始”和“結束”索引的串列。在這種情況下，兩個字串的長度相等。例如

str1='ATGGATCGATCG'
str2='CGGGCGCGCGCG'

在這里，匹配的長度是：GG、CG、CG
我想要以下型別的輸出：

list = [2,3,6,7,10,11] #list of the matched indices
start = [2,6,10] #start indices of the matched lengths
end = [3,7,11] #end indices if the matched lengths

現在，我的代碼塊類似于以下代碼，但我希望索引可以定位匹配的序列。

str1='ATGGATCGATCG'
str2='CGGGCGCGCGCG'

result1 = ''
result2 = ''

#handle the case where one string is longer than the other
maxlen=len(str2) if len(str1)<len(str2) else len(str1)

#loop through the characters
for i in range(maxlen): 
    letter1=str1[i:i 1]
    letter2=str2[i:i 1]
    if ((letter1 == letter2) and letter1 in ['A','T','C','G'] and letter2 in ['A','T','C','G']):
        result1 =letter1
        result2 =letter2

uj5u.com熱心網友回復：

這實際上是為了zip：

str1='ATGGATCGATCG'
str2='CGGGCGCGCGCG'

matches = []
for i,(a,b) in enumerate(zip(str1,str2)):
    if a == b:
        if not matches or matches[-1][1] != i-1
            matches.append([i,i])
        else:
            matches[-1][1]  = 1

print(matches)
starts = [k[0] for k in matches]
ends   = [k[1] for k in matches]

輸出：

[[2, 3], [6, 7], [10, 11]]

這也將捕獲單個字符匹配。如果需要，您可以在之后的快速回圈中過濾掉那些。

uj5u.com熱心網友回復：

你也可以用正則運算式做類似的事情。

import re
str1='ATGGATCGATCG'
str2='CGGGCGCGCGCG'

pat = 'GG|CG|CG'

matches = [[(m.span()[0],m.span()[1]-1) for m in re.finditer(pat,x)] for x in [str1,str2]]

m = set(matches[0]) & set(matches[1])
starts= [x[0] for x in m]
ends= [x[1] for x in m]

print(m,starts,ends, sep='\n')

輸出

{(2, 3), (6, 7), (10, 11)}
[2, 6, 10]
[3, 7, 11]

uj5u.com熱心網友回復：

您還可以使用numpy.split拆分非連續索引：

lst = [i for i, (s1,s2) in enumerate(zip(str1, str2)) if s1==s2]
splits = [0]   [idx 1 for idx, (i,j) in enumerate(zip(lst, lst[1:])) if j-i != 1]   [len(lst)]
start, end = zip(*[[arr[0], arr[-1]] for arr in np.split(lst, np.where(np.diff(lst) != 1)[0]   1)])

輸出：

((2, 6, 10), (3, 7, 11))

uj5u.com熱心網友回復：

對您的代碼進行了一些更正 1)max()是內置的，無需執行 if 陳述句，2) 字串已經是串列型別物件，因此"a" in "bbbbabb"已經回傳 True，無需將每個字母放入串列中。

看來您需要一個函式來確定兩個字串的開頭有多少一致。

import itertools as it
def f(s,t): 
    return sum(it.takewhile(bool,map(lambda z:z[0]==z[1],zip(s,t))))

使用這樣的函式，我們現在可以按照您的描述進行操作，并找到字串之間任意長度的所有同時匹配項：

str1='ATGGATCGATCG'
str2='CGGGCGCGCGCG'

matches = [(i,i l-1) for i,(a,b) in enumerate(zip(str1,str2)) if (l:=f(str1[i:],str2[i:]))>=2]
print(matches)

uj5u.com熱心網友回復：

讓我們從一個輔助函式開始，它將計算給定索引處兩個字串的公共前綴的長度

def helper(index, str1, str2):
    length = 0
    try:
        while str1[index] == str2[index]: #and other needed conditions
            length  = 1
            index  = 1
    except IndexError:
        pass
    return length

現在我們想在迭代時使用它

index = 0
result = []
while index < min(len(str1), len(str2)):
    length = helper(index, str1, str2)
    if length > 0:
        result.append(i, i length)
        i  = length   1 # We can omit one character as it was checked in helper
    else:
        i  = 1

轉載請註明出處，本文鏈接：https://www.uj5u.com/qianduan/393373.html

標籤：Python 细绳列表

上一篇：在ASP.NETMVC中從不同模型呼叫視圖

下一篇：是否有一些函式可以在C 中進行從字串到陣列的轉換