我有一個 Python 串列,其中有幾個子串列的標記為tokens. 我想阻止其中的令牌,以便輸出為stemmed_expected.
tokens = [['cooked', 'lovely','baked'],['hotel', 'going','liked'],['room','looking']]
stemmed_expected: [['cook', 'love','bake'],['hotel', 'go','like'],['room','look']]
我試過的 for 回圈如下:
from nltk.stem import PorterStemmer
ps = PorterStemmer()
stemmed_actual = []
for m in tokens:
for word in m:
word = ps.stem(word)
stemmed_actual.append(word)
但是這個 for 回圈的輸出是:
stemmed_actual = ['cook', 'love', 'bake', 'hotel', 'go', 'like', 'room', 'look']
如何修改 for 回圈以獲取子串列中的詞干詞,因為它是在stemmed_expected?
uj5u.com熱心網友回復:
您可以使用嵌套串列理解:
from nltk.stem import PorterStemmer
tokens = [['cooked', 'lovely','baked'],['hotel', 'going','liked'],['room','looking']]
ps = PorterStemmer()
stemmed = [[ps.stem(word) for word in sublst] for sublst in tokens]
print(stemmed)
# [['cook', 'love', 'bake'], ['hotel', 'go', 'like'], ['room', 'look']]
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/374255.html
