我遇到了以下問題:我有一個包含 15 分鐘時間步長值的 Pandas 資料框 (df),如下所示:
value
2018-12-28 01:00:00 01:00 5
2018-12-28 01:15:00 01:00 4
2018-12-28 01:30:00 01:00 2
2018-12-28 01:45:00 01:00 1
2018-12-28 02:00:00 01:00 2
...
2021-12-07 23:45:00 01:00 4
2021-12-08 00:00:00 01:00 3
2021-12-08 00:15:00 01:00 1
2021-12-08 00:30:00 01:00 2
2021-12-08 00:45:00 01:00 2
我想在這個資料框中添加一個額外的列,顯示上周特定小時列“值”的平均值。因此,換句話說,對于時間步長“2021-12-08 00:15:00 01:00”,我希望此列顯示 2021-12 之間 00:15 列“值”中所有值的平均值-01 和 2021-12-07。對此進行建模的最有效方法是什么?
非常感謝!
uj5u.com熱心網友回復:
這不是最漂亮/pythonic 的方式,但它有效:
#Create your df
df=pd.DataFrame(data=[random.randint(0,5) for i in range(2880)], index=pd.date_range('2021-11-08 01:00:00', '2021-12-08 00:45:00', freq='15min'), columns=['value'])
#add extra columns, separating the index in date and time
df['time'] = df.index.time
df['date'] = df.index.date
#creating result, by slicing the dataframe based on
df['result'] = df.apply(
lambda row: df.loc[
(df.date.between(
row.date - pd.DateOffset(weeks=1), #start = 1 week back
row.date - pd.DateOffset(days=1) #end is 1 day back
)
) & (df.time == row.time) #get same time
].value.mean(), #get mean of value
axis=1)
uj5u.com熱心網友回復:
這是一個僅適用于日期時間索引的更快解決方案:
def get_mean(x):
date_mask = (df.index >= (x.name - pd.Timedelta('7 days'))) & (df.index < (x.name)) # mask for past 7 days
past_week_data = df.loc[date_mask] # filter df by mask
times = past_week_data.at_time(x.name.time()) # filter results for matching times
return times['value'].mean() #return mean
df['mean'] = df.apply(lambda x: get_mean(x), axis=1)
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/395397.html
