我有以下資料幀:
| 日期 | 騎師編號 | 位置 |
|---|---|---|
| 23-12-2018 | 4340 | 1 |
| 25-11-2018 | 4340 | 5 |
| 19-12-2018 | 4340 | 10 |
| 01-01-2019 | 4340 | 3 |
| 18-10-2017 | 8443 | 1 |
| 18-02-2018 | 8443 | 6 |
| 12-05-2018 | 8443 | 7 |
我想計算Jockey ID過去 1000 天每個人的滾動平均最終位置。我正在尋找這樣的東西:
| 日期 | 騎師編號 | 位置 | 平均位置 |
|---|---|---|---|
| 23-12-2018 | 4340 | 1 | 1 (1/1) |
| 25-11-2018 | 4340 | 5 | 3 (1 5)/2 |
| 19-12-2018 | 4340 | 10 | 5.33 (1 5 10)/3 |
| 01-01-2019 | 4340 | 3 | 4.75 (1 5 10 3)/4 |
| 18-10-2017 | 8443 | 1 | 1 (1/1) |
| 18-02-2018 | 8443 | 6 | 3.5 (1 6)/2 |
| 12-05-2018 | 8443 | 7 | 4.66 (1 6 7)/3 |
關于如何做到這一點的任何想法?
uj5u.com熱心網友回復:
用:
df['Date'] = pd.to_datetime(df['Date'])
#here freq not raise error, but also not working
df['new'] = (df.set_index('Date')
.groupby('Jockey ID', sort=False)['Position']
.expanding(freq='1000D')
.mean()
.to_numpy())
print (df)
print (df)
Date Jockey ID Position new
0 2018-12-23 4340 1 1.000000
1 2018-11-25 4340 5 3.000000
2 2018-12-19 4340 10 5.333333
3 2019-01-01 4340 3 4.750000
4 2017-10-18 8443 1 1.000000
5 2018-02-18 8443 6 3.500000
6 2018-12-05 8443 7 4.666667
#for any freq same ouput
df['new'] = (df.set_index('Date')
.groupby('Jockey ID', sort=False)['Position']
.expanding(freq='30D')
.mean()
.to_numpy())
print (df)
Date Jockey ID Position new
0 2018-12-23 4340 1 1.000000
1 2018-11-25 4340 5 3.000000
2 2018-12-19 4340 10 5.333333
3 2019-01-01 4340 3 4.750000
4 2017-10-18 8443 1 1.000000
5 2018-02-18 8443 6 3.500000
6 2018-12-05 8443 7 4.666667
#here freq not raise error, but also not working same output like no freq
df['new'] = (df.set_index('Date')
.groupby('Jockey ID', sort=False)['Position']
.expanding()
.mean()
.to_numpy())
print (df)
Date Jockey ID Position new
0 2018-12-23 4340 1 1.000000
1 2018-11-25 4340 5 3.000000
2 2018-12-19 4340 10 5.333333
3 2019-01-01 4340 3 4.750000
4 2017-10-18 8443 1 1.000000
5 2018-02-18 8443 6 3.500000
6 2018-12-05 8443 7 4.666667
有可能的解決方法Grouper和GroupBy.transform:
df['new'] = (df.set_index('Date')
.groupby(['Jockey ID', pd.Grouper(freq='1000D')])['Position']
.transform(lambda x: x.expanding().mean())
.to_numpy())
print (df)
Date Jockey ID Position new
0 2018-12-23 4340 1 1.000000
1 2018-11-25 4340 5 3.000000
2 2018-12-19 4340 10 5.333333
3 2019-01-01 4340 3 4.750000
4 2017-10-18 8443 1 1.000000
5 2018-02-18 8443 6 3.500000
6 2018-12-05 8443 7 4.666667
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/357482.html
上一篇:從資料框列和值創建嵌套字典
下一篇:特定位置字符的識別和轉換
