我有兩個資料集,一個比另一個舊,我想比較兩個資料集之間的差異,創建一個可能包含此資訊的新資料集。兩個資料集具有相同的列。例如,資料集 1:
Cust1 Neigh_Cust2 Age Net_Value
Mickey Mouse Minnie 30 50000
Mickey Mouse Snoopy 30 24000
Mickey Mouse Duffy Duck 30 21000
Minnie Mickey Mouse 25 30000
Minnie Batman 25 12000
資料集 2(最新):
Cust1 Neigh_Cust2 Age Net_Value
Mickey Mouse Batman 30 23000
Mickey Mouse Superman 30 24000
Mickey Mouse Duffy Duck 30 21000
Minnie Mickey Mouse 25 30000
Minnie Batman 25 12000
Batman Mickey Mouse 48 53000
兩個資料集的長度可能不同。我的預期輸出是
Cust1 Neigh_Cust2 Age Net_Value New/Missing?
Mickey Mouse Batman 30 23000 New
Mickey Mouse Superman 30 24000 New
Batman Mickey Mouse 48 53000 New
Mickey Mouse Minnie 30 50000 Missing
Mickey Mouse Snoopy 30 24000 Missing
我曾想過使用 isin,但我有兩個欄位(Cust1和Neigh_Cust2),我有興趣查看其他資料集中包含或不包含哪些觀察結果,所以這可能不是查看問題并獲得預期輸出的最佳方式。
uj5u.com熱心網友回復:
我想你可以使用merge帶indicator=True:
(df1.merge(df2, on=list(df1.columns), indicator=True, how='outer')
.query('_merge != "both"')
.replace({'_merge': {'left_only': 'Missing', 'right_only': 'New'}})
.rename(columns={'_merge': 'New/Missing?'})
)
注意。我在這里比較所有列,但您可以使用on引數更改它
輸出:
Cust1 Neigh_Cust2 Age Net_Value New/Missing?
0 Mickey Mouse Minnie 30 50000 Missing
1 Mickey Mouse Snoopy 30 24000 Missing
5 Mickey Mouse Batman 30 23000 New
6 Mickey Mouse Superman 30 24000 New
7 Batman Mickey Mouse 48 53000 New
uj5u.com熱心網友回復:
要按某些列進行比較,請使用:
a = df1.set_index(['Cust1','Neigh_Cust2']).index
b = df2.set_index(['Cust1','Neigh_Cust2']).index
df = pd.concat([df2[~b.isin(a)].assign(NewOrMissing = 'New'),
df1[~a.isin(b)].assign(NewOrMissing = 'Missing')])
print (df)
Cust1 Neigh_Cust2 Age Net_Value NewOrMissing
0 Mickey Mouse Batman 30 23000 New
1 Mickey Mouse Superman 30 24000 New
5 Batman Mickey Mouse 48 53000 New
0 Mickey Mouse Minnie 30 50000 Missing
1 Mickey Mouse Snoopy 30 24000 Missing
如果Cust1, Neigh_Cust2原始資料幀中沒有重復項,則僅通過連接創建重復項:
df = (pd.concat([df1.assign(NewOrMissing = 'Missing'),
df2.assign(NewOrMissing = 'New')])
.drop_duplicates(['Cust1','Neigh_Cust2'], keep=False))
print (df)
Cust1 Neigh_Cust2 Age Net_Value NewOrMissing
0 Mickey Mouse Minnie 30 50000 Missing
1 Mickey Mouse Snoopy 30 24000 Missing
0 Mickey Mouse Batman 30 23000 New
1 Mickey Mouse Superman 30 24000 New
5 Batman Mickey Mouse 48 53000 New
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/353683.html
