pandas：如果該值在第二個資料框中，則根據另一個資料框中的條件替換列中的值-有解無憂

我有兩個資料框如下，

import pandas as pd
df = pd.DataFrame({'text':['I go to school','open the green door', 'go out and play'],
               'pos':[['PRON','VERB','ADP','NOUN'],['VERB','DET','ADJ','NOUN'],['VERB','ADP','CCONJ','VERB']]})

df2 = pd.DataFrame({'verbs':['go','open','close','share','divide'],
                   'new_verbs':['went','opened','closed','shared','divided']})

如果在 df2.verbs 中找到動詞，我想用 df2.new_verbs 中的過去形式替換 df.text 中的動詞。到目前為止，我已經完成了以下作業，

df['text'] = df['text'].str.split()
new_df = df.apply(pd.Series.explode)
new_df = new_df.assign(new=lambda d: d['pos'].mask(d['pos'] == 'VERB', d['text']))
new_df.text[new_df.new.isin(df2.verbs)] = df2.new_verbs

但是當我列印出結果時，并非所有動詞都被正確替換。我想要的輸出是，

       text    pos    new
0       I   PRON   PRON
0    went   VERB     go
0      to    ADP    ADP
0  school   NOUN   NOUN
1  opened   VERB   open
1     the    DET    DET
1   green    ADJ    ADJ
1    door   NOUN   NOUN
2    went   VERB     go
2     out    ADP    ADP
2     and  CCONJ  CCONJ
2    play   VERB   play

uj5u.com熱心網友回復：

您可以為此使用正則運算式：

import re
regex = '|'.join(map(re.escape, df2['verbs']))
s = df2.set_index('verbs')['new_verbs']

df['text'] = df['text'].str.replace(regex, lambda m: s.get(m.group(), m),
                                    regex=True)

輸出（為清楚起見，此處為第 2列文本）：

                  text                       pos                  text2
0       I go to school   [PRON, VERB, ADP, NOUN]       I went to school
1  open the green door    [VERB, DET, ADJ, NOUN]  opened the green door
2      go out and play  [VERB, ADP, CCONJ, VERB]      went out and play

uj5u.com熱心網友回復：

對于較小的串列，您可以使用 pandasreplace和這樣的字典：

verbs_map = dict(zip(df2.verbs, df2.new_verbs))
new_df.text.replace(verbs_map)

基本上，dict(zip(df2.verbs, df2.new_verbs)創建一個新字典，將舊動詞映射到它們的新（過去時）動詞，例如{'go' : 'went' , 'close' : 'closed', ...}。

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/468519.html

標籤：Python python-3.x 数据框代替

上一篇：如何粘貼資料框行中的文本，僅在R中保留唯一值

下一篇：從資料框創建系列字典