熊貓在兩列上連接表而不對值進行排序-有解無憂

我想實作這里描述的內容：stackoverflow question，但僅使用標準熊貓。

我有兩個資料框：Fist

  first_employee target_employee  relationship
0            Andy          Claude             0
1            Andy           Frida            20
2            Andy         Georgia           -10
3            Andy            Joan            30
4            Andy             Lee           -10
5            Andy           Pablo           -10
6            Andy         Vincent            20
7          Claude           Frida             0
8          Claude         Georgia            90
9          Claude            Joan             0
10         Claude             Lee             0
11         Claude           Pablo            10
12         Claude         Vincent             0
13          Frida         Georgia             0
14          Frida            Joan             0
15          Frida             Lee             0
16          Frida           Pablo            50
17          Frida         Vincent            60
18        Georgia            Joan             0
19        Georgia             Lee            10
20        Georgia           Pablo             0
21        Georgia         Vincent             0
22           Joan             Lee            70
23           Joan           Pablo             0
24           Joan         Vincent            10
25            Lee           Pablo             0
26            Lee         Vincent             0
27          Pablo         Vincent           -20

第二：

   first_employee target_employee  book_count
0         Vincent           Frida           2
1         Vincent           Pablo           1
2            Andy          Claude           1
3            Andy            Joan           1
4            Andy           Pablo           1
5            Andy             Lee           1
6            Andy           Frida           1
7            Andy         Georgia           1
8          Claude         Georgia           3
9            Joan             Lee           3
10          Pablo           Frida           2

我想加入兩個資料幀，以便我的最終資料幀與第一個資料幀相同，但它還有book_count具有相應值的列（如果不可用，則為 NaN）。

我已經寫了類似的東西： joined_df = first_df.merge(second_df, on = ['first_employee', 'target_employee'], how = 'outer')我得到：

   first_employee target_employee  relationship  book_count
0            Andy          Claude           0.0         1.0
1            Andy           Frida          20.0         1.0
2            Andy         Georgia         -10.0         1.0
3            Andy            Joan          30.0         1.0
4            Andy             Lee         -10.0         1.0
5            Andy           Pablo         -10.0         1.0
6            Andy         Vincent          20.0         NaN
7          Claude           Frida           0.0         NaN
8          Claude         Georgia          90.0         3.0
9          Claude            Joan           0.0         NaN
10         Claude             Lee           0.0         NaN
11         Claude           Pablo          10.0         NaN
12         Claude         Vincent           0.0         NaN
13          Frida         Georgia           0.0         NaN
14          Frida            Joan           0.0         NaN
15          Frida             Lee           0.0         NaN
16          Frida           Pablo          50.0         NaN
17          Frida         Vincent          60.0         NaN
18        Georgia            Joan           0.0         NaN
19        Georgia             Lee          10.0         NaN
20        Georgia           Pablo           0.0         NaN
21        Georgia         Vincent           0.0         NaN
22           Joan             Lee          70.0         3.0
23           Joan           Pablo           0.0         NaN
24           Joan         Vincent          10.0         NaN
25            Lee           Pablo           0.0         NaN
26            Lee         Vincent           0.0         NaN
27          Pablo         Vincent         -20.0         NaN
28        Vincent           Frida           NaN         2.0
29        Vincent           Pablo           NaN         1.0
30          Pablo           Frida           NaN         2.0

它有點接近我想要實作的目標。但是，中的值的順序first_employee和target_employee它不相關，因此如果在我擁有的第一個資料幀(Frida,Vincent)和第二個資料幀中(Vincent, Frida)，這兩個應該合并在一起（重要的是值，而不是列順序）。

在我生成的資料框中，我得到了三行額外的行：

28        Vincent           Frida           NaN         2.0
29        Vincent           Pablo           NaN         1.0
30          Pablo           Frida           NaN         2.0

這是我合并的結果，它考慮了“有序”值按列進行連接：這 3 個額外的行應該合并到已經可用的對(Frida, Vincent) (Pablo, Vincent)和(Frida, Pablo).

有沒有辦法只使用標準pandas函式來做到這一點？（我在開頭參考的問題使用了sqldf）

uj5u.com熱心網友回復：

我相信這就是你正在尋找的。使用np.sort將更改每行的前兩列的順序，使其按字母順序排列，從而允許合并正常作業。

cols = ['first_employee','target_employee']
df[cols] = np.sort(df[cols].to_numpy(),axis=1)
df2[cols] = np.sort(df2[cols].to_numpy(),axis=1)
ndf = pd.merge(df,df2,on = cols,how='left')

uj5u.com熱心網友回復：

key使用 first 和目標員工創建一個排序元組，然后在其上合并：

create_key = lambda x: tuple(sorted([x['first_employee'], x['target_employee']]))
out = pd.merge(df1.assign(_key=df1.apply(create_key, axis=1)),
               df2.assign(_key=df2.apply(create_key, axis=1)),
               on='_key', suffixes=('', '_key'), how='outer') \
        .loc[:, lambda x: ~x.columns.str.endswith('_key')]
print(out)

# Output:
   first_employee target_employee  relationship  book_count
0            Andy          Claude             0         1.0
1            Andy           Frida            20         1.0
2            Andy         Georgia           -10         1.0
3            Andy            Joan            30         1.0
4            Andy             Lee           -10         1.0
5            Andy           Pablo           -10         1.0
6            Andy         Vincent            20         NaN
7          Claude           Frida             0         NaN
8          Claude         Georgia            90         3.0
9          Claude            Joan             0         NaN
10         Claude             Lee             0         NaN
11         Claude           Pablo            10         NaN
12         Claude         Vincent             0         NaN
13          Frida         Georgia             0         NaN
14          Frida            Joan             0         NaN
15          Frida             Lee             0         NaN
16          Frida           Pablo            50         2.0
17          Frida         Vincent            60         2.0
18        Georgia            Joan             0         NaN
19        Georgia             Lee            10         NaN
20        Georgia           Pablo             0         NaN
21        Georgia         Vincent             0         NaN
22           Joan             Lee            70         3.0
23           Joan           Pablo             0         NaN
24           Joan         Vincent            10         NaN
25            Lee           Pablo             0         NaN
26            Lee         Vincent             0         NaN
27          Pablo         Vincent           -20         1.0

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/374306.html

標籤：Python 熊猫数据框加入合并

上一篇：在GoogleBigQuery中將3個大表連接成1個具有1對多關系的表

下一篇：將多個列與多個查找表連接起來