問題:
我有一個df看起來像這樣的資料框:
value msg_type
date
2022-03-15 08:15:10 00:00 122 None
2022-03-15 08:25:10 00:00 125 None
2022-03-15 08:30:10 00:00 126 None
2022-03-15 08:30:26.542134 00:00 127 ANNOUNCEMENT
2022-03-15 08:35:10 00:00 128 None
2022-03-15 08:40:10 00:00 122 None
2022-03-15 08:45:09 00:00 127 None
2022-03-15 08:50:09 00:00 133 None
2022-03-15 08:55:09 00:00 134 None
....
2022-03-16 09:30:09 00:00 132 None
2022-03-16 09:30:13.234425 00:00 135 ANNOUNCEMENT
2022-03-16 09:35:09 00:00 130 None
2022-03-16 09:40:09 00:00 134 None
2022-03-16 09:45:09 00:00 135 None
2022-03-16 09:50:09 00:00 134 None
資料以大約 5 分鐘的value間隔出現,但訊息可以隨時出現。我正在嘗試values每天繪制一條線,其中 x 軸的范圍從 t=-2 小時到 t= 8 小時,并且ANNOUNCEMENT發生在 t=0(見下圖)。
因此,例如,如果ANNOUNCEMENT3/15 上午 8:30 和 3/16 上午 9:30 再次發生,則應該有兩行:
- 3/15 的一行繪制從上午 6:30 到下午 4:30 的資料,以及
- 3/16 的一行繪制從上午 7:30 到下午 5:30 的資料,
兩者共享相同的 x 軸,范圍從 -2 到 8,ANNOUNCEMENTt=0。
我試過的:
I am able to do this currently by finding the index position of an announcement (e.g. say it occurs at row 298 -> announcement_index = 298), generating an array of 120 numbers from -24 to 96 (representing 10 hours at 5 minutes per number -> x = np.arange(-24, 96, 1)), then plotting
sns.lineplot(x, y=df['value'].iloc[announcement_index-24:announcement_index 96])
While this does mostly work (see image below), I suspect it's not the correct way to go about it. Specifically, trying to add more info to the plot (like a different set of 'value' markers) at specific times is difficult because I would need to convert the timestamp into this arbitrary 24-96 value range.
How can I make this same plot but by utilizing the datetime index instead? Thank you so much!

uj5u.com熱心網友回復:
假設索引已經被轉換
完整代碼:
import io
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
s = '''
date,value,msg_type
2022-03-15 08:15:10 00:00,122,None
2022-03-15 08:25:10 00:00,125,None
2022-03-15 08:30:10 00:00,126,None
2022-03-15 08:30:26.542134 00:00,127,ANNOUNCEMENT
2022-03-15 08:35:10 00:00,128,None
2022-03-15 08:40:10 00:00,122,None
2022-03-15 08:45:09 00:00,127,None
2022-03-15 08:50:09 00:00,133,None
2022-03-15 08:55:09 00:00,134,None
2022-03-16 09:30:09 00:00,132,None
2022-03-16 09:30:13.234425 00:00,135,ANNOUNCEMENT
2022-03-16 09:35:09 00:00,130,None
2022-03-16 09:40:09 00:00,134,None
2022-03-16 09:45:09 00:00,135,None
2022-03-16 09:50:09 00:00,134,None
'''
df = pd.read_csv(io.StringIO(s), index_col=0, parse_dates=['date'])
# create intervals from -1H to 2H of the index
dl, dr = -1, 2
left = df.index pd.Timedelta(f'{dl}H')
right = df.index pd.Timedelta(f'{dr}H')
df['interval'] = pd.arrays.IntervalArray.from_arrays(left, right)
# plot each announcement's interval.left to interval.right
fig, ax = plt.subplots()
for ann in df.loc[df['msg_type'] == 'ANNOUNCEMENT')].itertuples():
window = df.loc[ann.interval.left:ann.interval.right] # extract interval.left to interval.right
window.index -= ann.Index # compute time since announcement
window.index = window.index.total_seconds() # convert to seconds since announcement
window.plot(ax=ax, y='value', label=ann.Index.date())
deltas = np.arange(dl, dr 1)
ax.set(xticks=deltas * 3600, xticklabels=deltas) # set tick labels to hours since announcement
ax.grid()
ax.legend()
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/448057.html
標籤:python pandas datetime matplotlib seaborn
上一篇:更改時間戳變數格式或進行替換
