我有一個DataFrame帶有兩列識別符號的列Date,ID并嘗試計算Score每三天的滾動平均值ID隨著時間的推移。
Date ID Score
2022-01-02 1 1
2022-01-03 1 2
2022-01-04 1 1
2022-01-05 1 3
2022-01-02 2 5
2022-01-03 2 6
2022-01-04 2 7
2022-01-05 2 3
到目前為止,我只知道如何在不考慮第二個識別符號的情況下在一個特定列上創建滾動平均值ID:
df[RollingMean3]=df[Score].rolling(3).mean()
我試著得到
Date ID Score ScoreRollingMean3
2022-01-02 1 1 NaN
2022-01-03 1 2 NaN
2022-01-04 1 1 1.33
2022-01-05 1 3 2
2022-01-02 2 5 NaN
2022-01-03 2 6 NaN
2022-01-04 2 7 6
2022-01-05 2 3 5.33
為了重現性:
df = pd.DataFrame({
'Date':['2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05'],
'ID':[1, 1, 1, 1, 2, 2, 2, 2],
'Score':[1, 2, 1, 3, 5, 6, 7, 3]})
非常感謝
uj5u.com熱心網友回復:
如果日期時間是連續的,則使用DataFrame.groupbywith :Series.droplevel
df['RollingMean3']=df.groupby('ID')['Score'].rolling(3).mean().droplevel(0)
print (df)
Date ID Score RollingMean3
0 2022-01-02 1 1 NaN
1 2022-01-03 1 2 NaN
2 2022-01-04 1 1 1.333333
3 2022-01-05 1 3 2.000000
4 2022-01-02 2 5 NaN
5 2022-01-03 2 6 NaN
6 2022-01-04 2 7 6.000000
7 2022-01-05 2 3 5.333333
帶有滾動視窗3D的一般解決方案可以通過DatetimeIndex:
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date')
df['RollingMean3']=df.groupby('ID')['Score'].rolling('3D').mean().droplevel(0)
print (df)
ID Score RollingMean3
Date
2022-01-02 1 1 1.000000
2022-01-03 1 2 1.500000
2022-01-04 1 1 1.333333
2022-01-05 1 3 2.000000
2022-01-02 2 5 5.000000
2022-01-03 2 6 5.500000
2022-01-04 2 7 6.000000
2022-01-05 2 3 5.333333
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/505729.html
下一篇:從熊貓資料框中的一列中獲取兩個值
