我正在嘗試將一列值從 合并df2到df1。df1.merge(df2, how='outer')似乎是我需要的,但結果不是我想要的,因為重復。使用“on”介紹我也不想要的_x。_y
在下面的示例中:sub=site1在兩者中df1都是df2相同的,然后'fred'從df2替換'own'為df1。
# Pandas Merge test:
import pandas as pd
df1 = pd.DataFrame({'sub': ['site1', 'site2', 'site3'], 'iss': ['enc1', 'enc2', 'enc3'], 'rem': [1, 3, 5], 'own': ['andy', 'brian', 'cody']})
df2 = pd.DataFrame({'sub': ['data1', 'data2', 'site1'], 'rem': [2, 4, 6], 'own': ['david', 'edger', 'fred']})
>>> df1
sub iss rem own
0 site1 enc1 1 andy
1 site2 enc2 3 brian
2 site3 enc3 5 cody
>>> df2
sub rem own
0 data1 2 david
1 data2 4 edger
2 site1 6 fred
>>> df1.merge(df2, how='outer')
sub iss rem own
0 site1 enc1 1 andy
1 site2 enc2 3 brian
2 site3 enc3 5 cody
3 data1 NaN 2 david
4 data2 NaN 4 edger
5 site1 NaN 6 fred
>>> df1.merge(df2, on='sub', how='outer')
sub iss rem_x own_x rem_y own_y
0 site1 enc1 1.0 andy 6.0 fred
1 site2 enc2 3.0 brian NaN NaN
2 site3 enc3 5.0 cody NaN NaN
3 data1 NaN NaN NaN 2.0 david
4 data2 NaN NaN NaN 4.0 edger
預期輸出:
sub iss rem own
0 site1 enc1 1 fred
1 site2 enc2 3 brian
2 site3 enc3 5 cody
3 data1 NaN 2 david
4 data2 NaN 4 edger
uj5u.com熱心網友回復:
這是一種方法
# update the df1.own with the values for it in the df2
# using map
df1['own'] = df1['sub'].map(df2.set_index('sub')['own']).fillna(df1['own'])
out=(pd.concat([df1, df2]) # concat the two DF
.drop_duplicates(subset=['sub']) # drop duplicates
.reset_index() # reset index
.drop(columns='index')) # remove the unwanted column
out
sub iss rem own
0 site1 enc1 1 fred
1 site2 enc2 3 brian
2 site3 enc3 5 cody
3 data1 NaN 2 david
4 data2 NaN 4 edger
或者,
# merge the two DF, and drop the duplicates
out=(pd.concat([df1, df2])
.drop_duplicates(subset=['sub'])
.reset_index()
.drop(columns='index'))
# map the own in the resulting DF from concat
out['own'] = out['sub'].map(df2.set_index('sub')['own']).fillna(out['own'])
out
sub iss rem own
0 site1 enc1 1 fred
1 site2 enc2 3 brian
2 site3 enc3 5 cody
3 data1 NaN 2 david
4 data2 NaN 4 edger
uj5u.com熱心網友回復:
一個潛在的有點簡單的解決方案,使用pd.concat并loc過濾 df1 以僅包含 df2 中不存在的記錄,然后將它們連接在一起。
# used to make use loc on index as it is a bit simpler.
df1 = df1.set_index('sub')
df2 = df2.set_index('sub')
然后pd.concat他們在一起。
df3 = pd.concat([df1[~df1.index.isin(df2.index)],df2])
輸出:
print(df3)
iss rem own
sub
site2 enc2 3 brian
site3 enc3 5 cody
data1 NaN 2 david
data2 NaN 4 edger
site1 NaN 6 fred
這不會將 site1 的rem和的值更改iss為等于df1though 的值。如果還需要,您可以添加一個附加loc陳述句作為可能的解決方案。像這樣:
df3.loc[(df3.index.isin(df1.index.to_list())) & ~(df3['rem'].isin(df1['rem'].to_list())), ['iss','rem']] = df1[['iss','rem']]
最終輸出
iss rem own
sub
site2 enc2 3 brian
site3 enc3 5 cody
data1 NaN 2 david
data2 NaN 4 edger
site1 enc1 1 fred
uj5u.com熱心網友回復:
編輯:根據@bkeesey 的評論更改為使用 update 而不是 fillna
您需要合并 sub 然后更新新列并洗掉舊列
嘗試
import pandas as pd
df1 = pd.DataFrame({'sub': ['site1', 'site2', 'site3'], 'iss': ['enc1', 'enc2', 'enc3'], 'rem': [1, 3, 5], 'own': ['andy', 'brian', 'cody']})
df2 = pd.DataFrame({'sub': ['data1', 'data2', 'site1'], 'rem': [2, 4, 6], 'own': ['david', 'edger', 'fred']})
dfm = df1.merge(df2, on='sub', how='outer', suffixes=["_x",""])
dfm.own.update(dfm.own_x)
dfm.rem.update(dfm.rem_x)
del dfm["own_x"]
del dfm["rem_x"]
結果
sub iss rem own
0 site1 enc1 6.0 fred
1 site2 enc2 3.0 brian
2 site3 enc3 5.0 cody
3 data1 NaN 2.0 david
4 data2 NaN 4.0 edger
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/535600.html
上一篇:根據特定列中先前行的值洗掉重復行
