我想實作這里描述的內容:stackoverflow question,但僅使用標準熊貓。
我有兩個資料框:Fist
first_employee target_employee relationship
0 Andy Claude 0
1 Andy Frida 20
2 Andy Georgia -10
3 Andy Joan 30
4 Andy Lee -10
5 Andy Pablo -10
6 Andy Vincent 20
7 Claude Frida 0
8 Claude Georgia 90
9 Claude Joan 0
10 Claude Lee 0
11 Claude Pablo 10
12 Claude Vincent 0
13 Frida Georgia 0
14 Frida Joan 0
15 Frida Lee 0
16 Frida Pablo 50
17 Frida Vincent 60
18 Georgia Joan 0
19 Georgia Lee 10
20 Georgia Pablo 0
21 Georgia Vincent 0
22 Joan Lee 70
23 Joan Pablo 0
24 Joan Vincent 10
25 Lee Pablo 0
26 Lee Vincent 0
27 Pablo Vincent -20
第二:
first_employee target_employee book_count
0 Vincent Frida 2
1 Vincent Pablo 1
2 Andy Claude 1
3 Andy Joan 1
4 Andy Pablo 1
5 Andy Lee 1
6 Andy Frida 1
7 Andy Georgia 1
8 Claude Georgia 3
9 Joan Lee 3
10 Pablo Frida 2
我想加入兩個資料幀,以便我的最終資料幀與第一個資料幀相同,但它還有book_count具有相應值的列(如果不可用,則為 NaN)。
我已經寫了類似的東西: joined_df = first_df.merge(second_df, on = ['first_employee', 'target_employee'], how = 'outer')我得到:
first_employee target_employee relationship book_count
0 Andy Claude 0.0 1.0
1 Andy Frida 20.0 1.0
2 Andy Georgia -10.0 1.0
3 Andy Joan 30.0 1.0
4 Andy Lee -10.0 1.0
5 Andy Pablo -10.0 1.0
6 Andy Vincent 20.0 NaN
7 Claude Frida 0.0 NaN
8 Claude Georgia 90.0 3.0
9 Claude Joan 0.0 NaN
10 Claude Lee 0.0 NaN
11 Claude Pablo 10.0 NaN
12 Claude Vincent 0.0 NaN
13 Frida Georgia 0.0 NaN
14 Frida Joan 0.0 NaN
15 Frida Lee 0.0 NaN
16 Frida Pablo 50.0 NaN
17 Frida Vincent 60.0 NaN
18 Georgia Joan 0.0 NaN
19 Georgia Lee 10.0 NaN
20 Georgia Pablo 0.0 NaN
21 Georgia Vincent 0.0 NaN
22 Joan Lee 70.0 3.0
23 Joan Pablo 0.0 NaN
24 Joan Vincent 10.0 NaN
25 Lee Pablo 0.0 NaN
26 Lee Vincent 0.0 NaN
27 Pablo Vincent -20.0 NaN
28 Vincent Frida NaN 2.0
29 Vincent Pablo NaN 1.0
30 Pablo Frida NaN 2.0
它有點接近我想要實作的目標。但是, 中的值的順序first_employee和target_employee它不相關,因此如果在我擁有的第一個資料幀(Frida,Vincent)和第二個資料幀中(Vincent, Frida),這兩個應該合并在一起(重要的是值,而不是列順序)。
在我生成的資料框中,我得到了三行額外的行:
28 Vincent Frida NaN 2.0
29 Vincent Pablo NaN 1.0
30 Pablo Frida NaN 2.0
這是我合并的結果,它考慮了“有序”值按列進行連接:這 3 個額外的行應該合并到已經可用的對(Frida, Vincent) (Pablo, Vincent)和(Frida, Pablo).
有沒有辦法只使用標準pandas函式來做到這一點?(我在開頭參考的問題使用了sqldf)
uj5u.com熱心網友回復:
我相信這就是你正在尋找的。使用np.sort將更改每行的前兩列的順序,使其按字母順序排列,從而允許合并正常作業。
cols = ['first_employee','target_employee']
df[cols] = np.sort(df[cols].to_numpy(),axis=1)
df2[cols] = np.sort(df2[cols].to_numpy(),axis=1)
ndf = pd.merge(df,df2,on = cols,how='left')
uj5u.com熱心網友回復:
key使用 first 和目標員工創建一個排序元組,然后在其上合并:
create_key = lambda x: tuple(sorted([x['first_employee'], x['target_employee']]))
out = pd.merge(df1.assign(_key=df1.apply(create_key, axis=1)),
df2.assign(_key=df2.apply(create_key, axis=1)),
on='_key', suffixes=('', '_key'), how='outer') \
.loc[:, lambda x: ~x.columns.str.endswith('_key')]
print(out)
# Output:
first_employee target_employee relationship book_count
0 Andy Claude 0 1.0
1 Andy Frida 20 1.0
2 Andy Georgia -10 1.0
3 Andy Joan 30 1.0
4 Andy Lee -10 1.0
5 Andy Pablo -10 1.0
6 Andy Vincent 20 NaN
7 Claude Frida 0 NaN
8 Claude Georgia 90 3.0
9 Claude Joan 0 NaN
10 Claude Lee 0 NaN
11 Claude Pablo 10 NaN
12 Claude Vincent 0 NaN
13 Frida Georgia 0 NaN
14 Frida Joan 0 NaN
15 Frida Lee 0 NaN
16 Frida Pablo 50 2.0
17 Frida Vincent 60 2.0
18 Georgia Joan 0 NaN
19 Georgia Lee 10 NaN
20 Georgia Pablo 0 NaN
21 Georgia Vincent 0 NaN
22 Joan Lee 70 3.0
23 Joan Pablo 0 NaN
24 Joan Vincent 10 NaN
25 Lee Pablo 0 NaN
26 Lee Vincent 0 NaN
27 Pablo Vincent -20 1.0
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/374306.html
上一篇:在GoogleBigQuery中將3個大表連接成1個具有1對多關系的表
下一篇:將多個列與多個查找表連接起來
