給定一個字串,通常是一個句子,我想提取 lengths 的所有子字串3, 4, 5, 6。如何僅使用 Python 的標準庫有效地實作這一目標?這是我的方法,我正在尋找一種更快的方法。對我來說,似乎無論哪種方式,三個外部回圈都是不可避免的,但也許有一個低級優化的解決方案itertools。
import time
def naive(test_sentence, start, end):
grams = []
for word in test_sentence:
for size in range(start, end):
for i in range(len(word)):
k = word[i:i size]
if len(k)==size:
grams.append(k)
return grams
n = 10**6
start, end = 3, 7
test_sentence = "Hi this is a wonderful test sentence".split(" ")
start_time = time.time()
for _ in range(n):
naive(test_sentence, start, end)
end_time = time.time()
print(f"{end-start} seconds for naive approach")
的輸出naive():
['thi', 'his', 'this', 'won', 'ond', 'nde', 'der', 'erf', 'rfu', 'ful', 'wond', 'onde', 'nder', 'derf', 'erfu', 'rful', 'wonde', 'onder', 'nderf', 'derfu', 'erful', 'wonder', 'onderf', 'nderfu', 'derful', 'tes', 'est', 'test', 'sen', 'ent', 'nte', 'ten', 'enc', 'nce', 'sent', 'ente', 'nten', 'tenc', 'ence', 'sente', 'enten', 'ntenc', 'tence', 'senten', 'entenc', 'ntence']
第二個版本:
def naive2(test_sentence,start,end):
grams = []
for word in test_sentence:
if len(word) >= start:
for size in range(start,end):
for i in range(len(word)-size 1):
grams.append(word[i:i size])
return grams
uj5u.com熱心網友回復:
好吧,我認為這是不可能改進演算法的,但是您可以對功能進行微優化:
def naive3(test_sentence,start,end):
rng = range(start,end)
return [word[i:i size] for word in test_sentence
if len(word) >= start
for size in rng
for i in range(len(word) 1-size)]
Python 3.8 引入了對性能非常有用的賦值運算式。因此,如果您可以使用最新版本,那么您可以撰寫:
def naive4(test_sentence,start,end):
rng = range(start,end)
return [word[i:i size] for word in test_sentence
if (lenWord := len(word) 1) > start
for size in rng
for i in range(lenWord-size)]
以下是性能結果:
naive2: 8.28 μs ± 55 ns per call
naive3: 7.28 μs ± 124 ns per call
naive4: 6.86 μs ± 48 ns per call (20% faster than naive2)
請注意,一半的時間naive4用于創建word[i:i size]字串物件,其余時間主要用于 CPython 解釋器(主要是由于可變大小整數物件的創建/參考計數/洗掉)。
uj5u.com熱心網友回復:
我相信這會做到:
test_sentence = "Hi this is a wonderful test sentence".split()
lengths = [3, 4, 5, 6]
result = []
for t in test_sentence:
for l in lengths:
if len(t) >= l:
start = 0
while start l <= len(t):
result.append(t[start:start l])
start = 1
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/380998.html
