我有一個簡單的 pandas DataFrame,我需要在其中添加一個新列,該列顯示與 current_price 列匹配的其他列“pricemonths”范圍內“current_price”的出現“計數”:
import pandas as pd
import numpy as np
# my data
data = {'Item':['Bananas', 'Apples', 'Pears', 'Avocados','Grapes','Melons'],
'Jan':[1,0.5,1.1,0.6,2,4],
'Feb':[0.9,0.5,1,0.6,2,5],
'Mar':[1,0.6,1,0.6,2.1,6],
'Apr':[1,0.6,1,0.6,2,5],
'May':[1,0.5,1.1,0.6,2,5],
'Current_Price':[1,0.6,1,0.6,2,4]
}
# import my data
df = pd.DataFrame(data)
pricemonths=['Jan','Feb','Mar','Apr','May']
因此,我的最終資料框將包含另一列('times_found'),其值:
'times_found'
4
2
3
5
4
1
uj5u.com熱心網友回復:
一種方法是轉置 的價格部分df,然后用于eq跨索引與“Current_Price”進行比較(創建一個布爾資料框,其中 True 用于匹配價格,否則為 False)并跨行查找總和:
df['times_found'] = df['Current_Price'].eq(df.loc[:,'Jan':'May'].T).sum(axis=0)
或使用 numpy 廣播:
df['times_found'] = (df.loc[:,'Jan':'May'].to_numpy() == df[['Current_Price']].to_numpy()).sum(axis=1)
@HenryEcker 的優秀建議:軸上的 DataFrame 等于可能比轉置更大的 DataFrame 更快:
df['times_found'] = df.loc[:, 'Jan':'May'].eq(df['Current_Price'], axis=0).sum(axis=1)
輸出:
Item Jan Feb Mar Apr May Current_Price times_found
0 Bananas 1.0 0.9 1.0 1.0 1.0 1.0 4
1 Apples 0.5 0.5 0.6 0.6 0.5 0.6 2
2 Pears 1.1 1.0 1.0 1.0 1.1 1.0 3
3 Avocados 0.6 0.6 0.6 0.6 0.6 0.6 5
4 Grapes 2.0 2.0 2.1 2.0 2.0 2.0 4
5 Melons 4.0 5.0 6.0 5.0 5.0 4.0 1
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/420947.html
標籤:
