熊貓合并問題-有解無憂

我正在嘗試將一列值從合并df2到df1。df1.merge(df2, how='outer')似乎是我需要的，但結果不是我想要的，因為重復。使用“on”介紹我也不想要的_x。_y

在下面的示例中：sub=site1在兩者中df1都是df2相同的，然后'fred'從df2替換'own'為df1。

# Pandas Merge test:

import pandas as pd

df1 = pd.DataFrame({'sub': ['site1', 'site2', 'site3'], 'iss': ['enc1', 'enc2', 'enc3'], 'rem': [1, 3, 5], 'own': ['andy', 'brian', 'cody']})
df2 = pd.DataFrame({'sub': ['data1', 'data2', 'site1'], 'rem': [2, 4, 6], 'own': ['david', 'edger', 'fred']})

>>> df1
     sub   iss  rem    own
0  site1  enc1    1   andy
1  site2  enc2    3  brian
2  site3  enc3    5   cody

>>> df2
     sub  rem    own
0  data1    2  david
1  data2    4  edger
2  site1    6   fred

>>> df1.merge(df2, how='outer')
     sub   iss  rem    own
0  site1  enc1    1   andy
1  site2  enc2    3  brian
2  site3  enc3    5   cody
3  data1   NaN    2  david
4  data2   NaN    4  edger
5  site1   NaN    6   fred

>>> df1.merge(df2, on='sub', how='outer')
     sub   iss  rem_x  own_x  rem_y  own_y
0  site1  enc1    1.0   andy    6.0   fred
1  site2  enc2    3.0  brian    NaN    NaN
2  site3  enc3    5.0   cody    NaN    NaN
3  data1   NaN    NaN    NaN    2.0  david
4  data2   NaN    NaN    NaN    4.0  edger

預期輸出：

     sub   iss  rem    own
0  site1  enc1    1   fred
1  site2  enc2    3  brian
2  site3  enc3    5   cody
3  data1   NaN    2  david
4  data2   NaN    4  edger

uj5u.com熱心網友回復：

這是一種方法


# update the df1.own with the values for it in the df2
# using map
df1['own'] = df1['sub'].map(df2.set_index('sub')['own']).fillna(df1['own'])


out=(pd.concat([df1, df2])            # concat the two DF
.drop_duplicates(subset=['sub'])      # drop duplicates
.reset_index()                        # reset index
.drop(columns='index'))               # remove the unwanted column

out

    sub     iss     rem     own
0   site1   enc1    1   fred
1   site2   enc2    3   brian
2   site3   enc3    5   cody
3   data1   NaN     2   david
4   data2   NaN     4   edger

或者，

# merge the two DF, and drop the duplicates
out=(pd.concat([df1, df2])
.drop_duplicates(subset=['sub'])
.reset_index()
.drop(columns='index'))

# map the own in the resulting DF from concat
out['own'] = out['sub'].map(df2.set_index('sub')['own']).fillna(out['own'])
out

sub     iss     rem     own
0   site1   enc1    1   fred
1   site2   enc2    3   brian
2   site3   enc3    5   cody
3   data1   NaN     2   david
4   data2   NaN     4   edger

uj5u.com熱心網友回復：

一個潛在的有點簡單的解決方案，使用pd.concat并loc過濾 df1 以僅包含 df2 中不存在的記錄，然后將它們連接在一起。

# used to make use loc on index as it is a bit simpler.
df1 = df1.set_index('sub')
df2 = df2.set_index('sub')

然后pd.concat他們在一起。

df3 = pd.concat([df1[~df1.index.isin(df2.index)],df2])

輸出：

print(df3)
        iss  rem    own
sub                    
site2  enc2    3  brian
site3  enc3    5   cody
data1   NaN    2  david
data2   NaN    4  edger
site1   NaN    6   fred

這不會將 site1 的rem和的值更改iss為等于df1though 的值。如果還需要，您可以添加一個附加loc陳述句作為可能的解決方案。像這樣：

df3.loc[(df3.index.isin(df1.index.to_list())) & ~(df3['rem'].isin(df1['rem'].to_list())), ['iss','rem']] = df1[['iss','rem']]

最終輸出

        iss  rem    own
sub                    
site2  enc2    3  brian
site3  enc3    5   cody
data1   NaN    2  david
data2   NaN    4  edger
site1  enc1    1   fred

uj5u.com熱心網友回復：

編輯：根據@bkeesey 的評論更改為使用 update 而不是 fillna

您需要合并 sub 然后更新新列并洗掉舊列

嘗試

import pandas as pd

df1 = pd.DataFrame({'sub': ['site1', 'site2', 'site3'], 'iss': ['enc1', 'enc2', 'enc3'], 'rem': [1, 3, 5], 'own': ['andy', 'brian', 'cody']})
df2 = pd.DataFrame({'sub': ['data1', 'data2', 'site1'], 'rem': [2, 4, 6], 'own': ['david', 'edger', 'fred']})

dfm = df1.merge(df2, on='sub', how='outer', suffixes=["_x",""])

dfm.own.update(dfm.own_x)
dfm.rem.update(dfm.rem_x)

del dfm["own_x"]
del dfm["rem_x"]

結果

     sub   iss  rem    own
0  site1  enc1  6.0   fred
1  site2  enc2  3.0  brian
2  site3  enc3  5.0   cody
3  data1   NaN  2.0  david
4  data2   NaN  4.0  edger

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/535600.html

標籤：Python熊猫数据框合并

上一篇：根據特定列中先前行的值洗掉重復行

下一篇：將x軸添加到具有多個y軸折線圖的matplotlib