我有給定的DataFrame:
TXN_DATE_TIME TX_ID CUST_ID STATE_1 STATE_2 STATE_3
01-06-2020 00:00 1 123 Maharashtra Maharashtra Maharashtra
01-06-2020 00:00 2 345 Pune Chennai Gujarat
01-06-2020 00:00 3 222 Chennai Gujarat Chennai
01-06-2020 00:00 4 1356 Gujarat Chennai Delhi
01-06-2020 00:00 5 2345 Punjab Punjab Delhi
01-06-2020 00:00 6 1111 Haryana Delhi Punjab
01-06-2020 00:00 7 5678 Delhi Maharashtra Haryana
01-06-2020 00:00 8 9999 Kerela Assam Assam
01-06-2020 00:00 9 2345 Assam Assam Assam
01-06-2020 00:00 10 6666 Tripura Tripura Tripura
01-06-2020 00:00 11 7896 Kolkatta Kolkatta Kolkatta
我想根據以下條件在包含兩個值 Match 和 No match 的 df 中創建一個新列匹配:
If State_1==State_2==STATE_3 Then Match=1
Else Match=0
因此,預期df將是:

我嘗試通過以下方式np.where在熊貓df中使用:
df['MATCH']=np.where(df['STATE_1']==df['STATE_2']==df['STATE_3'],1,0)
但它給了我以下錯誤:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
我想知道除了 np.where 之外還有其他更快的方法可以用來實作預期的結果,如果沒有,我該如何避免這個錯誤??
uj5u.com熱心網友回復:
采用:
df['MATCH']=(df['STATE_1']==df['STATE_2'])&(df['STATE_2']==df['STATE_3'])
df['MATCH'] = df['MATCH'].astype(int)
uj5u.com熱心網友回復:
我們也可以使用eq all:
df['MATCH'] = df[['STATE_2', 'STATE_3']].eq(df['STATE_1'], axis=0).all(axis=1).astype(int)
如果您有超過 3 個“狀態”,我們可以使用filter:
df['MATCH'] = df.filter(like='STATE').eq(df['STATE_1'], axis=0).all(axis=1)
輸出:
TXN_DATE_TIME TX_ID CUST_ID STATE_1 STATE_2 STATE_3 MATCH
0 01-06-2020 00:00 1 123 Maharashtra Maharashtra Maharashtra 1
1 01-06-2020 00:00 2 345 Pune Chennai Gujarat 0
2 01-06-2020 00:00 3 222 Chennai Gujarat Chennai 0
3 01-06-2020 00:00 4 1356 Gujarat Chennai Delhi 0
4 01-06-2020 00:00 5 2345 Punjab Punjab Delhi 0
5 01-06-2020 00:00 6 1111 Haryana Delhi Punjab 0
6 01-06-2020 00:00 7 5678 Delhi Maharashtra Haryana 0
7 01-06-2020 00:00 8 9999 Kerela Assam Assam 0
8 01-06-2020 00:00 9 2345 Assam Assam Assam 1
9 01-06-2020 00:00 10 6666 Tripura Tripura Tripura 1
10 01-06-2020 00:00 11 7896 Kolkatta Kolkatta Kolkatta 1
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/433567.html
標籤:Python python-3.x 熊猫 数据框 麻木的
上一篇:僅選擇pandas資料框的可用行
下一篇:求平均值的函式
