假設我有以下資料框
df.Consumption
0 16.208
1 11.193
2 9.845
3 9.348
4 9.091
...
19611 0.000
19612 0.000
19613 0.000
19614 0.000
19615 0.000
Name: Consumption, Length: 19616, dtype: float64
我想用不是 0.00 的前 10 個和下一個值的平均值替換 0 值
有什么好的方法可以做到?我正在考慮使用替換和插值方法,但我看不到如何有效地撰寫它
uj5u.com熱心網友回復:
這應該讓你很接近。它利用了未計入平均值的空值,因此您可以用 nan 替換零,然后回圈遍歷。
我不確定沒有按行應用的更好方法。
有些東西告訴我,做一個實際的回圈,每次迭代更新 df 會給你稍微不同的結果,因為你會在你進行時輸入空值,這將使之前的 10 個結果總是有一個值。
import pandas as pd
df = pd.DataFrame({'Consumption':[1,1,1,1,1,1,1,1,1,0,2,2,2,2,2,2,2,2,2,2]})
df.replace(0,np.nan, inplace=True)
df.update(df.apply(lambda x:np.mean(df.Consumption.iloc[max(x.name-10,0):]), axis=1).to_frame('Consumption'),overwrite=False)
輸出
Consumption
0 1.000000
1 1.000000
2 1.000000
3 1.000000
4 1.000000
5 1.000000
6 1.000000
7 1.000000
8 1.000000
9 1.526316
10 2.000000
11 2.000000
12 2.000000
13 2.000000
14 2.000000
15 2.000000
16 2.000000
17 2.000000
18 2.000000
19 2.000000
uj5u.com熱心網友回復:
您可以Series.rolling()與 withcenter=True一起使用Rolling.mean()以獲取上一個和下一個值的平均值。
如果要從均值計算中排除,請替換0為。NaN0
設定center=True以便滾動視窗查找上一個和下一個條目。
最后,使用 將這些條目的值設定0為平均值.loc,如下所示:
n = 10 # check previous and next 10 entries
# rolling window size is (2n 1)
Consumption_mean = (df['Consumption'].replace(0, np.nan)
.rolling(n * 2 1, min_periods=1, center=True)
.mean())
df.loc[df['Consumption'] == 0, 'Consumption'] = Consumption_mean
演示
使用較小的視窗大小n = 3來演示:
df
Consumption
0 16.208
1 11.193
2 9.845
3 9.348
4 9.091
5 8.010
6 0.000 <==== target entry
7 7.100
8 0.000 <==== target entry
9 6.800
10 6.500
11 6.300
12 5.900
13 5.800
14 5.600
#n = 10 # check previous and next 10 entries
n = 3 # smaller window size for demo
# rolling window size is (2n 1)
Consumption_mean = (df['Consumption'].replace(0, np.nan)
.rolling(n * 2 1, min_periods=1, center=True)
.mean())
# Update into a new column `Consumption_New` for demo purpose
df['Consumption_New'] = df['Consumption']
df.loc[df['Consumption'] == 0, 'Consumption_New'] = Consumption_mean
演示結果:
print(df)
Consumption Consumption_New
0 16.208 16.2080
1 11.193 11.1930
2 9.845 9.8450
3 9.348 9.3480
4 9.091 9.0910
5 8.010 8.0100
6 0.000 8.0698 # 8.0698 = (9.348 9.091 8.01 7.1 6.8) / 5 with skipping 0.000 between 7.100 and 6.800
7 7.100 7.1000
8 0.000 6.9420 # 6.942 = (8.01 7.1 6.8 6.5 6.3) / 5 with skipping 0.000 between 8.010 and 7.100
9 6.800 6.8000
10 6.500 6.5000
11 6.300 6.3000
12 5.900 5.9000
13 5.800 5.8000
14 5.600 5.6000
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/336094.html
