我需要計算兩組字串在一個句子中出現的次數。然而,只要在 A 組中的字串之前有一個否定,我希望將計數添加到 B 組。
為此,我撰寫了一個運行良好的代碼。讓我首先向您展示資料框和字串組:
# Dataframe
df = pd.DataFrame({'X': ['Ciao, I would like to count the number of occurrences in this text considering negations that can change the meaning of the sentence',
"Hello, not number of negations, in this case we need to take care of the negation.",
"Hello world, don't number is another case in which where we need to consider negations."]})
# Group of words to look into text
a = pd.DataFrame(['number','ciao','text','care'], columns = ['A'])
d = pd.DataFrame(['need'], columns = ['D'])
這就是完成這項作業的代碼:
res0 = []
res1 = []
for i in range(len(df)):
if df['X'][i].find('not') < df['X'][i].find('number') and df['X'][i].find('not') > 0 and abs(
df['X'][i].find('not') - df['X'][i].find('number')) < 15:
pattern0 = '|'.join(a[a.A != 'number'].A)
text = df['X'][i]
count0 = len(re.findall(pattern0, text))
res0.append(count0)
pattern1 = '|'.join(d.append({'D': 'number'}, ignore_index=True).D)
count1 = len(re.findall(pattern1, text))
res1.append(count1)
else:
pattern2 = '|'.join(a.A)
text = df['X'][i]
count2 = len(re.findall(pattern2, text))
res0.append(count2)
pattern3 = '|'.join(d.D)
count3 = len(re.findall(pattern3, text))
res1.append(count3)
pd.Series(res0) # [2,1,1]
pd.Series(res1) # [0,2,1]
那是什么問題呢?問題是我只考慮一個否定('not')和一個單詞a('number')。我想做的是擴展代碼以遍歷每個否定neg(見下文)和a. 但是,當我嘗試這樣做時,我得到了錯誤的結果。在下面找到我的嘗試:
neg = ['not','dont',"wasnt"]
res0=[]
res1=[]
for i in range(len(df)):
for j in range(len(neg)):
for k in range(len(a)):
if df['X'][i].find(neg[j]) < df['X'][i].find(a.A[k]) and df['X'][i].find(neg[j]) > 0 and abs(df['X'][i].find(neg[j]) - df['X'][i].find(a.A[k])) < 15:
pattern0 = '|'.join(a[a.A != a.A[k]].A)
text = df['X'][i]
count0 = len(re.findall(pattern0, text))
res0.append(count0)
pattern1 = '|'.join(d.append({'D': a.A[k]}, ignore_index = True).D)
count1 = len(re.findall(pattern1, text))
res1.append(count1)
else:
pattern2 = '|'.join(a.A)
text = df['X'][i]
count2 = len(re.findall(pattern2, text))
res0.append(count2)
pattern3 = '|'.join(d.D)
count3 = len(re.findall(pattern3, text))
res1.append(count3)
pd.Series(res0) # non sense
pd.Series(res1) # non sense
# results should remain a 3x1 vector
我究竟做錯了什么?
謝謝你的幫助!
uj5u.com熱心網友回復:
你可以用積極的眼光來檢查否定:
pattern = r"(?:(?<=not)|(?<=don't)|(?<=wasn't))\s (?:number|other|words)"
df['neg_count'] = df['X'].str.findall(pattern).str.len()
print(df)
# Output
X neg_count
0 Ciao, I would like to count the number of occu... 0
1 Hello, not number of negations, in this case w... 1
2 Hello world, don't number is another case in w... 1
正則運算式101
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/424483.html
上一篇:Python中的簡單小乘法表程式
