我有這樣的功能remove_stopwords。如何讓它運行得更快?
temp.reverse()
def drop_stopwords(text):
for x in temp:
elif len(x.split()) > 1:
text_list = text.split()
for y in range(len(text_list)-len(x.split())):
if " ".join(text_list[y:y len(x.split())]) == x:
del text_list[y:y len(x.split())]
text = " ".join(text_list)
else:
text = " ".join(text for text in text.split() if text not in vietnamese)
return text
解決資料中文本的時間是 14 秒,如果我有這樣的技巧,時間將減少到 3 秒:
temp.reverse()
def drop_stopwords(text):
for x in temp:
if len(x.split()) >2:
if x in text:
text = text.replace(x,'')
elif len(x.split()) > 1:
text_list = text.split()
for y in range(len(text_list)-len(x.split())):
if " ".join(text_list[y:y len(x.split())]) == x:
del text_list[y:y len(x.split())]
text = " ".join(text_list)
else:
text = " ".join(text for text in text.split() if text not in vietnamese)
return text
但我認為在我的語言中的某些地方可能會出錯。如何在 Python 中重寫此函式以使其更快(在 C 和 C 中,我可以使用上面的函式輕松解決它:(()
uj5u.com熱心網友回復:
你的函式一遍又一遍地做很多相同的事情,特別是重復split和join相同的text. 做一個split,對串列進行操作,最后再做一個join可能會更快,而且肯定會導致代碼更簡單。不幸的是,我沒有你的任何樣本資料來測驗性能,但希望這能給你一些實驗:
temp = ["foo", "baz ola"]
def drop_stopwords(text):
text_list = text.split()
text_len = len(text_list)
for word in temp:
word_list = word.split()
word_len = len(word_list)
for i in range(text_len 1 - word_len):
if text_list[i:i word_len] == word_list:
text_list[i:i word_len] = [None] * word_len
return ' '.join(t for t in text_list if t)
print(drop_stopwords("the quick brown foo jumped over the baz ola dog"))
# the quick brown jumped over the dog
您也可以嘗試text.replace在所有情況下迭代執行,并查看與更復雜的split基于解決方案相比的執行情況:
temp = ["foo", "baz ola"]
def drop_stopwords(text):
for word in temp:
text = text.replace(word, '')
return ' '.join(text.split())
print(drop_stopwords("the quick brown foo jumped over the baz ola dog"))
# the quick brown jumped over the dog
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/400823.html
上一篇:創建具有最大值(AZ)的列
