我正在嘗試根據第 3 列的條件設定 2 列。我可以在另一列上設定 1 列條件,我可以在單個條件值上設定 2 列,但是當我嘗試通過列上的條件設定 2 列時,它失敗了。
這是代碼示例:
import pandas as pd
import numpy as np
AAA = {"column A": [1, 1, 1, 2, 2, 2, 3, 3, 3]}
df = pd.DataFrame(AAA)
如果我打電話:
df["column B"], df["column C"] = np.where(True ,['4', '8'],['NaN', 'NaN'])
我得到:
df
column A column B column C
0 1 4 8
1 1 4 8
2 1 4 8
3 2 4 8
4 2 4 8
5 2 4 8
6 3 4 8
7 3 4 8
8 3 4 8
所以我知道我可以根據條件填寫 2 列。
如果我打電話:
df["column B"] = np.where( df["column A"] == 2 ,['4'],['NaN'])
我得到:
column A column B column C
0 1 NaN 8
1 1 NaN 8
2 1 NaN 8
3 2 4 8
4 2 4 8
5 2 4 8
6 3 NaN 8
7 3 NaN 8
8 3 NaN 8
所以我知道我可以根據列上的條件進行填充。(我假設這被視為布爾陣列)
但是,如果我打電話:
df["column B"], df["column C"] = np.where( df["column A"] == 2 ,['4', '8'],['NaN', 'NaN'])
我希望得到
column A column B column C
0 1 NaN NaN
1 1 NaN NaN
2 1 NaN NaN
3 2 4 8
4 2 4 8
5 2 4 8
6 3 NaN NaN
7 3 NaN NaN
8 3 NaN NaN
但我實際上得到:
Traceback (most recent call last):
... pydev\_pydevd_bundle\pydevd_exec2.py", line 3, in Exec
exec(exp, global_vars, local_vars)
File "<string>", line 2, in <module>
File "<__array_function__ internals>", line 6, in where
ValueError: operands could not be broadcast together with shapes (9,) (2,) (2,)
有沒有辦法做我想做的事?我不想使用 2 個單獨的呼叫,因為我需要它的資料幀非常大。
uj5u.com熱心網友回復:
使用loc索引器并賦值
df.loc[df['column A'] == 2, ['column B', 'column C']] = [4, 8]
輸出(df):
column A column B column C
0 1 NaN NaN
1 1 NaN NaN
2 1 NaN NaN
3 2 4.0 8.0
4 2 4.0 8.0
5 2 4.0 8.0
6 3 NaN NaN
7 3 NaN NaN
8 3 NaN NaN
uj5u.com熱心網友回復:
也許你可以在外面回圈np.where:
df["column B"], df["column C"] = [np.where( df["column A"] == 2 ,true_val,'NaN') for true_val in ['4','8']]
print(df)
# column A column B column C
# 0 1 NaN NaN
# 1 1 NaN NaN
# 2 1 NaN NaN
# 3 2 4 8
# 4 2 4 8
# 5 2 4 8
# 6 3 NaN NaN
# 7 3 NaN NaN
# 8 3 NaN NaN
uj5u.com熱心網友回復:
你快到了!這只是“廣播”的問題。
您可以使用其他人提出的任何問題。或者使用相同的概念,但稍微重塑輸入。
像這樣:
# Reshape the condition, then transpose the output.
df["column B"], df["column C"] = np.where( np.array(df["column A"] == 2).reshape(-1,1) ,['4', '8'],['NaN', 'NaN']).T
或者像這樣:
# Or just reshape the lists
df["column B"], df["column C"] = np.where( df["column A"] == 2 ,np.array(['4', '8']).reshape(-1,1),np.array(['NaN', 'NaN']).reshape(-1,1))
輸出:
column A column B column C
0 1 NaN NaN
1 1 NaN NaN
2 1 NaN NaN
3 2 4 8
4 2 4 8
5 2 4 8
6 3 NaN NaN
7 3 NaN NaN
8 3 NaN NaN
您可以查看有關廣播的 numpy 檔案以了解想法:https ://numpy.org/doc/stable/user/basics.broadcasting.html
uj5u.com熱心網友回復:
這是一種方法。雖然它不是最優雅的代碼,但它應該可以幫助您理解需要什么。
import pandas as pd
import numpy as np
AAA={"column A": [1, 1, 1, 2, 2, 2, 3, 3, 3]}
df = pd.DataFrame(AAA)
col_length = len(df['column A'])
fours = np.repeat(4, col_length, axis =0)
eights = np.repeat(8, col_length, axis =0)
empties = np.repeat(np.nan, col_length, axis =0)
df["column B"], df["column C"] = np.where( df["column A"] == 2 ,[fours, eights], [empties, empties])
print(df)
輸出:
column A column B column C
0 1 NaN NaN
1 1 NaN NaN
2 1 NaN NaN
3 2 4.0 8.0
4 2 4.0 8.0
5 2 4.0 8.0
6 3 NaN NaN
7 3 NaN NaN
8 3 NaN NaN
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/533450.html
