我有一個串列
a = ['apples', 'bananas', 'oranges', 'grapes']
和一個帶有一列短語的資料框
| b | C |
|---|---|
| 有 5 個蘋果 | 有 5 個 |
| 這里有 3 個梨 | 這里有 3 個梨 |
| 我要2顆葡萄 | 我要2 |
我想在我的資料框中有另一列,它從串列 a 中洗掉單詞(例如在資料框列 c 中)。它們需要完全匹配。
在搜索了一些正則運算式后,我想出了這個,但它似乎不能正常作業。
regex = re.compile('|'.join(re.escape(x) for x in a), re.IGNORECASE)
removed = []
for i in df['b']:
words = re.findall(regex, str(i))
removed.append(words)
df['c']=removed
df
也得到了這個錯誤:位置不平衡括號
uj5u.com熱心網友回復:
您實際上不需要任何正則運算式,因為這些是完全匹配的。
你可以這樣做:
import pandas as pd
a = ['apples', 'bananas', 'oranges', 'grapes']
df = pd.DataFrame({'b': ['there are 5 apples', 'here are 5 pears', 'I want 2 grapes']})
# for each row in `b` remove all words that are in `a`
df['c'] = df['b'].apply(lambda x: ' '.join([word for word in x.split() if word not in a]))
b c
0 there are 5 apples there are 5
1 here are 5 pears here are 5 pears
2 I want 2 grapes I want 2
uj5u.com熱心網友回復:
使用str.replace:
我稍微修改了你的正則運算式:
regex = re.compile(fr"\s*({'|'.join(re.escape(x) for x in a)})", re.IGNORECASE)
df['c'] = df['b'].str.replace(regex, '')
print(df)
# Output
b c
0 there are 5 apples there are 5
1 here are 3 pears here are 3 pears
2 i want 2 grapes i want 2
uj5u.com熱心網友回復:
你可以使用reduce:
import re
from functools import reduce
a = ['apples', 'bananas', 'oranges', 'grapes']
sentences = ["there are 5 apples", "here are 3 pears", "i want 2 grapes"]
print([reduce(lambda x, p: re.sub(p, "", x), a, sentence).strip() for sentence in sentences])
輸出
['there are 5', 'here are 3 pears', 'i want 2']
uj5u.com熱心網友回復:
您可以將列 b 轉換為單詞串列,使用 explode 和 groupby 僅保留不在 a 中的列,然后將所有內容加入
代碼可以是:
# split column b into lists and explode it
words = df['b'].str.split().explode()
# remove words contained in a list
words = words[~ words.isin(a)]
# join everything back
df['c'] = words.groupby(level=0).agg(list).transform(' '.join)
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/412463.html
標籤:
下一篇:兩列的條件累積總和
