如何使用Pandas和groupby計算一段時間內列的滾動平均值？-有解無憂

我有以下資料幀：

日期	騎師編號	位置
23-12-2018	4340	1
25-11-2018	4340	5
19-12-2018	4340	10
01-01-2019	4340	3
18-10-2017	8443	1
18-02-2018	8443	6
12-05-2018	8443	7

我想計算Jockey ID過去 1000 天每個人的滾動平均最終位置。我正在尋找這樣的東西：

日期	騎師編號	位置	平均位置
23-12-2018	4340	1	1 (1/1)
25-11-2018	4340	5	3 (1 5)/2
19-12-2018	4340	10	5.33 (1 5 10)/3
01-01-2019	4340	3	4.75 (1 5 10 3)/4
18-10-2017	8443	1	1 (1/1)
18-02-2018	8443	6	3.5 (1 6)/2
12-05-2018	8443	7	4.66 (1 6 7)/3

關于如何做到這一點的任何想法？

uj5u.com熱心網友回復：

用：

df['Date'] = pd.to_datetime(df['Date'])

#here freq not raise error, but also not working
df['new'] = (df.set_index('Date')
               .groupby('Jockey ID', sort=False)['Position']
               .expanding(freq='1000D')
               .mean()
               .to_numpy())
print (df)
print (df)
        Date  Jockey ID  Position       new
0 2018-12-23       4340         1  1.000000
1 2018-11-25       4340         5  3.000000
2 2018-12-19       4340        10  5.333333
3 2019-01-01       4340         3  4.750000
4 2017-10-18       8443         1  1.000000
5 2018-02-18       8443         6  3.500000
6 2018-12-05       8443         7  4.666667

#for any freq same ouput
df['new'] = (df.set_index('Date')
               .groupby('Jockey ID', sort=False)['Position']
               .expanding(freq='30D')
               .mean()
               .to_numpy())
print (df)
        Date  Jockey ID  Position       new
0 2018-12-23       4340         1  1.000000
1 2018-11-25       4340         5  3.000000
2 2018-12-19       4340        10  5.333333
3 2019-01-01       4340         3  4.750000
4 2017-10-18       8443         1  1.000000
5 2018-02-18       8443         6  3.500000
6 2018-12-05       8443         7  4.666667

#here freq not raise error, but also not working same output like no freq
df['new'] = (df.set_index('Date')
               .groupby('Jockey ID', sort=False)['Position']
               .expanding()
               .mean()
               .to_numpy())
print (df)
        Date  Jockey ID  Position       new
0 2018-12-23       4340         1  1.000000
1 2018-11-25       4340         5  3.000000
2 2018-12-19       4340        10  5.333333
3 2019-01-01       4340         3  4.750000
4 2017-10-18       8443         1  1.000000
5 2018-02-18       8443         6  3.500000
6 2018-12-05       8443         7  4.666667

有可能的解決方法Grouper和GroupBy.transform：

df['new'] = (df.set_index('Date')
               .groupby(['Jockey ID', pd.Grouper(freq='1000D')])['Position']
               .transform(lambda x: x.expanding().mean())
               .to_numpy())
print (df)
        Date  Jockey ID  Position       new
0 2018-12-23       4340         1  1.000000
1 2018-11-25       4340         5  3.000000
2 2018-12-19       4340        10  5.333333
3 2019-01-01       4340         3  4.750000
4 2017-10-18       8443         1  1.000000
5 2018-02-18       8443         6  3.500000
6 2018-12-05       8443         7  4.666667

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/357482.html

標籤：Python 熊猫

上一篇：從資料框列和值創建嵌套字典

下一篇：特定位置字符的識別和轉換