我有一個汽車屬性資料集,在某些列中缺少值。例如,在Distance列中,有缺失值,我想用平均值替換它們。但是,還有第二列,Car Type它顯示汽車是全新的還是二手的。與二手車相比,新車的行駛里程不會那么多。我想用其中的值Distance的平均值替換 NaN 值DistanceCar Type == 'Brand New'
最小設定:
df = pd.DataFrame({'Car type': ['New','Used','New','New','New','Used','New','New'],
'Distance':[20,2222,34,np.nan,np.nan,np.nan,50,10]})
print(df)
Car type Distance
0 New 20.0
1 Used 2222.0
2 New 34.0
3 New NaN
4 New NaN
5 Used NaN
6 New 50.0
7 New 10.0
uj5u.com熱心網友回復:
計算每個的平均值Car Type并將值(用 )廣播transform到所有行,然后用fillna平均值替換 NaN:
df['Distance'] = (df['Distance'].fillna(df.groupby('Car type')['Distance']
.transform('mean')))
print(df)
# Output
Car type Distance
0 New 20.0
1 Used 2222.0
2 New 34.0
3 New 28.5 # mean of New car
4 New 28.5 # mean of New car
5 Used 2222.0 # mean of Used car
6 New 50.0
7 New 10.0
uj5u.com熱心網友回復:
df['distance'].fillna((df['distance'].mean()), inplace=True)
上面的代碼進行了簡單的替換。請注意,inplace 陳述句會使用此更改修改原始物件
import numpy as np
df['distance'] = np.where(df['Car type'] == "Brand New", df['distance'].mean(), df['distance'])
這使用 numpy,因此請務必將 numpy 匯入為 np。優雅而快速
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/486107.html
