比較2個資料框的值并根據值匹配和不匹配結果創建一個新的資料框-有解無憂

我有一個資料框 df1。

df1 = pd.DataFrame([["A","X",5,4,1],["A","Y",3,1,3],["B","X",4,7,4],["B","W",3,9,3],["C","Z",7,4,5],["C","Y",1,0,6],["D","P",8,4,7],["D","Q",7,2,2]], columns=['col1', 'col2', 'col3', 'col4','col5'])

  col1 col2  col3  col4  col5
0    A    X     5     4     1
1    A    Y     3     1     3
2    B    X     4     7     4
3    B    W     3     9     3
4    C    Z     7     4     5
5    C    Y     1     0     6
6    D    P     8     4     7
7    D    Q     7     2     2

我有另一個資料框 df2。

df2 = pd.DataFrame([["B","W",3,7,3],["B","X",4,7,5],["C","Z",8,4,6],["C","Y",1,0,6]], columns=['col1', 'col2', 'col3', 'col4','col5'])

df1 中存在的所有行都不存在于 df2 中，并且它們的行順序不同。

  col1 col2  col3  col4  col5
0    B    W     3     7     3
1    B    X     4     7     5
2    C    Z     8     4     6
3    C    Y     1     0     6

我想比較 2 個資料幀的特定行的值。如果兩個資料框中的值相同，則將其設為 True，否則設為 False。

預期輸出：

Out = pd.DataFrame([["B","W",True,False,True],["B","X",True,True,False],["C","Z",False,True,False],["C","Y",True,True,True]], columns=['col1', 'col2', 'col3', 'col4','col5'])

  col1 col2   col3   col4   col5
0    B    W   True  False   True
1    B    X   True   True  False
2    C    Y   True   True   True
3    C    Z  False   True  False

怎么做？

uj5u.com熱心網友回復：

IIUC，你可以這樣做：

# set reference columns
cols = ['col1', 'col2']

# set references as index to align the data and compare
Out = df1.set_index(cols).eq(df2.set_index(cols))

# keep only rows where there is at least one True
# and restore the references as columns
Out = Out[Out.any(axis=1)].reset_index()

輸出：

  col1 col2   col3   col4   col5
0    B    W   True  False   True
1    B    X   True   True  False
2    C    Y   True   True   True
3    C    Z  False   True  False

uj5u.com熱心網友回復：

也可以這樣做：

cols = ['col1', 'col2']

# concat both dataframes and creating a new unique index
c_df = pd.concat([df1, df2], ignore_index=True)

# Described after this snippet
Out = c_df[cols].join(~c_df.groupby(cols).diff().dropna().astype(bool), how='inner')

您正在按參考列對行進行分組，在它們之間執行差異。
在單行組中diff回傳NaN并且您不希望這樣，這就是您放棄它的原因。
其余值是數字。如果 diff 為零，則表示同一行的兩列相等。
如果你將 diff 轉換為布林值，你會得到False相等的值，這就是你需要執行否定 ( ~)的原因
給你！您應該將新列內部連接到連接資料中的參考列視圖中，以匹配您的輸出，僅此而已

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/424771.html

標籤：Python python-3.x 熊猫数据框

上一篇：減少pandas資料框中的id組合

下一篇：如何在資料框系列中的另一個字串中傳入f字串