我有一個這樣的代碼:
df1 = pd.DataFrame(
{
"A": ["A0", "A1", "A2", "A3"],
"B": ["B0", "B1", "B2", "B3"],
"C": ["C0", "C1", "C2", "C3"],
"D": ["D0", "D1", "D2", "D3"],
}
)
df2 = pd.DataFrame(
{
"A": ["A4", "A5", "A6", "A7"],
"B": ["B4", "B5", "B6", "B7"],
"C": ["C4", "C5", "C6", "C7"],
"D": ["D4", "D5", "D6", "D7"],
}
)
def changeDF(df):
df['Signal'] = 0
changeDF(df1)
changeDF(df2)
當我在上面運行時,(changeDf) 函式向 df1 和 df2 添加一個名為“信號”的列,值為 0。但不是像下面那樣直接使用多處理運行 (changeDf) 它不會改變任何 dfs。
s = [df1, df2]
with multiprocessing.Pool(processes=2) as pool:
res = pool.map(changeDF, s)
我的代碼有什么問題?
uj5u.com熱心網友回復:
序列化df1&df2用于多處理意味著您正在制作副本。
從函式回傳您的資料框,它會正常作業。
def changeDF(df):
df['Signal'] = 0
return(df)
with multiprocessing.Pool(processes=2) as pool:
df1, df2 = pool.map(changeDF, [df1, df2])
我要警告您,這樣做的序列化成本肯定會高于您從多處理中獲得的好處。
uj5u.com熱心網友回復:
將您的功能更改changeDF為如下所示:
def changeDF(df):
df['Signal'] = 0
return df
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/362566.html
標籤:Python 熊猫 列表 数据框 python-多处理
