我正在做選區決議。我有一個更大的字串,這個字串包含括號。
示例:“[快速棕色狐貍] [跳過] [懶狗]。”
我有一個較小的子字串,但這個字串不包含括號:
“快褐狐貍跳過去”
我想在較大的子字串中找到較小子字串的版本,但要包含括號:
ie return "[quick brown fox][jumps over]"
這是我的實作,但是當子字串是整個句子而較大的字串是整個段落(即 100,000 個字符長)時,它會變得非常慢。
def get_span(smaller_sentence, larger_sentence):
for start in range(len(larger_sentence)):
subsentence= larger_sentence[start:]
without_brackets=remove_brackets(subsentence)
if(without_brackets.startswith(smaller_sentence)):
for end in range(start, len(larger_sentence) 1):
without_brackets=remove_brackets(larger_sentence[start:end])
if(without_brackets==smaller_sentence):
return larger_sentence[start:end]
raise ValueError(f"Span not found {smaller_sentence, larger_sentence}")
有沒有更有效的做事方式?
uj5u.com熱心網友回復:
一種可以在線性時間復雜度上實作這一點的方法是創建索引串列,左括號和右括號應該插入到去掉括號的較大句子中。
然后,在較大句子中找到較小句子的位置找到最左邊括號和最右邊括號的索引(使用bisect模塊更有效地找到它們)。
獲得括號的索引后,根據索引對剝離的較大句子進行切片,并交替插入左右括號以組裝最終輸出:
from bisect import bisect_left, bisect_right
from itertools import chain, pairwise
def get_span(smaller_sentence, larger_sentence):
stripped = larger_sentence.translate(str.maketrans('', '', '[]'))
if (left := stripped.find(smaller_sentence)) == -1:
raise ValueError(f"Span not found {smaller_sentence, larger_sentence}")
right = left len(smaller_sentence)
count = last_index = 0
lefts, rights = [], []
while True:
for bracket, indices in zip('[]', (lefts, rights)):
index = larger_sentence.find(bracket, last_index)
if index == -1:
break
indices.append(index - count)
count = 1
last_index = index 1
else:
continue
break
start, end = bisect_left(lefts, left), bisect_right(rights, right)
return ''.join(
fragment for (fragment_start, fragment_end), bracket in zip(
pairwise((
left,
*chain.from_iterable(zip(lefts[start:end], rights[start:end])),
right
)),
(*'[]' * (end - start), '')
)
for fragment in (stripped[fragment_start:fragment_end], bracket)
)
以便:
get_span(
"quick brown fox jumps over",
"The [quick brown fox] [jumps over] [the lazy dog]."
)
回傳:
[quick brown fox] [jumps over]
演示:https ://www.mycompiler.io/view/7ILLCtFKWNG
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/519870.html
標籤:Python细绳表现搜索
上一篇:Cypher:慢查詢優化
