如何更快地撰寫這個romove_stopwordspython？-有解無憂

我有這樣的功能remove_stopwords。如何讓它運行得更快？

temp.reverse()

def drop_stopwords(text):
    
    for x in temp:
        elif len(x.split()) > 1:
            text_list = text.split()  
            for y in range(len(text_list)-len(x.split())):
                if " ".join(text_list[y:y len(x.split())]) == x:
                    del text_list[y:y len(x.split())]
                    text = " ".join(text_list)
        
        else:
            text = " ".join(text for text in text.split() if text not in vietnamese)

    return text

解決資料中文本的時間是 14 秒，如果我有這樣的技巧，時間將減少到 3 秒：


temp.reverse()

def drop_stopwords(text):
    
    for x in temp:
        if len(x.split()) >2:
            if x in text:
                text = text.replace(x,'')

        elif len(x.split()) > 1:
            text_list = text.split()  
            for y in range(len(text_list)-len(x.split())):
                if " ".join(text_list[y:y len(x.split())]) == x:
                    del text_list[y:y len(x.split())]
                    text = " ".join(text_list)
        
        else:
            text = " ".join(text for text in text.split() if text not in vietnamese)

    return text

但我認為在我的語言中的某些地方可能會出錯。如何在 Python 中重寫此函式以使其更快（在 C 和 C 中，我可以使用上面的函式輕松解決它:(()

uj5u.com熱心網友回復：

你的函式一遍又一遍地做很多相同的事情，特別是重復split和join相同的text. 做一個split，對串列進行操作，最后再做一個join可能會更快，而且肯定會導致代碼更簡單。不幸的是，我沒有你的任何樣本資料來測驗性能，但希望這能給你一些實驗：

temp = ["foo", "baz ola"]


def drop_stopwords(text):
    text_list = text.split()
    text_len = len(text_list)
    for word in temp:
        word_list = word.split()
        word_len = len(word_list)
        for i in range(text_len   1 - word_len):
            if text_list[i:i word_len] == word_list:
                text_list[i:i word_len] = [None] * word_len
    return ' '.join(t for t in text_list if t)


print(drop_stopwords("the quick brown foo jumped over the baz ola dog"))
# the quick brown jumped over the dog

您也可以嘗試text.replace在所有情況下迭代執行，并查看與更復雜的split基于解決方案相比的執行情況：

temp = ["foo", "baz ola"]


def drop_stopwords(text):
    for word in temp:
        text = text.replace(word, '')
    return ' '.join(text.split())


print(drop_stopwords("the quick brown foo jumped over the baz ola dog"))
# the quick brown jumped over the dog

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/400823.html

標籤：Python 熊猫停用词

上一篇：創建具有最大值(AZ)的列

下一篇：Pandasgroupby占總數的百分比并添加小計