我剛剛學著用python做情感分析,跟著一位老師學敲的機械壓縮去詞的代碼,代碼沒有報錯,但也沒有成功去掉重復詞,搞了半天也沒搞懂,請高人幫忙看看,我把前后代碼都貼上。資料是從cvs匯入的某商品評論列
這個代碼可能看著費眼睛,可以去我的博客https://blog.csdn.net/adamsww/article/details/106384090
//
data = pd.DataFrame(data['內容'].unique())
def cutword(strs, reverse=False):
s1 = []#存一個字符
s2 = []#存第二個字符
s = []#存最終結果
if reverse:
strs = strs[::-1]
s1.append(strs[0])
for i in strs[1:]:
if i ==s1[0]:
if len(s2)==0:
s2.append(i)
else:
if s1 == s2:
s2 = []
s2.append(i)
else:
s = s+s1+s2
s1 = []
s2 = []
s1.append(i)
else:
if s1 == s2 and len(s1)>=2 and len(s2)>=2:
s = s + s1
s1 = []
s2 = []
s1.append(i)
else:
if len(s2)==0:
s1.append(i)
else:
s2.append(i)
if s1 == s2:
s = s + s1
else:
s = s + s1 + s2
if reverse :
return ''.join(s[::-1])
else:
return ''.join(s)
#機械壓縮去詞
#用aplly免用for回圈
data2 = data.iloc[:,0].apply(cutword)
data2 = data2.apply(cutword, reverse = True)
print('機械壓縮去詞后:')
print(len(data2))
print(type(data2))
print('---------------')
var foo = 'bar';
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/41872.html
下一篇:16位匯編是玩具嗎?
