匹配串列中資料框列中的單詞-有解無憂

我有一個串列
a = ['apples', 'bananas', 'oranges', 'grapes']

和一個帶有一列短語的資料框

b	C
有 5 個蘋果	有 5 個
這里有 3 個梨	這里有 3 個梨
我要2顆葡萄	我要2

我想在我的資料框中有另一列，它從串列 a 中洗掉單詞（例如在資料框列 c 中）。它們需要完全匹配。

在搜索了一些正則運算式后，我想出了這個，但它似乎不能正常作業。

regex = re.compile('|'.join(re.escape(x) for x in a), re.IGNORECASE)

removed = []
for i in df['b']:
    words = re.findall(regex, str(i))
    removed.append(words)

df['c']=removed
df

也得到了這個錯誤：位置不平衡括號

uj5u.com熱心網友回復：

您實際上不需要任何正則運算式，因為這些是完全匹配的。

你可以這樣做：

import pandas as pd
a = ['apples', 'bananas', 'oranges', 'grapes']

df = pd.DataFrame({'b': ['there are 5 apples', 'here are 5 pears', 'I want 2 grapes']})
# for each row in `b` remove all words that are in `a`
df['c'] = df['b'].apply(lambda x: ' '.join([word for word in x.split() if word not in a]))


    b   c
0   there are 5 apples  there are 5
1   here are 5 pears    here are 5 pears
2   I want 2 grapes I want 2

uj5u.com熱心網友回復：

使用str.replace：

我稍微修改了你的正則運算式：

regex = re.compile(fr"\s*({'|'.join(re.escape(x) for x in a)})", re.IGNORECASE)

df['c'] = df['b'].str.replace(regex, '')
print(df)

# Output
                    b                 c
0  there are 5 apples       there are 5
1    here are 3 pears  here are 3 pears
2     i want 2 grapes          i want 2

uj5u.com熱心網友回復：

你可以使用reduce：

import re
from functools import reduce

a = ['apples', 'bananas', 'oranges', 'grapes']

sentences = ["there are 5 apples", "here are 3 pears", "i want 2 grapes"]

print([reduce(lambda x, p: re.sub(p, "", x), a, sentence).strip() for sentence in sentences])

輸出

['there are 5', 'here are 3 pears', 'i want 2']

uj5u.com熱心網友回復：

您可以將列 b 轉換為單詞串列，使用 explode 和 groupby 僅保留不在 a 中的列，然后將所有內容加入

代碼可以是：

# split column b into lists and explode it
words = df['b'].str.split().explode()
# remove words contained in a list
words = words[~ words.isin(a)]

# join everything back
df['c'] = words.groupby(level=0).agg(list).transform(' '.join)

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/412463.html

標籤：

上一篇：計算每個分組的第三個五分位數的平均值

下一篇：兩列的條件累積總和