我有以下資料框:
Site
Date
2021-07-01 08:00:00 54
2021-07-01 09:00:00 23
2021-07-01 10:00:00 13
2021-07-01 11:00:00 23
2021-07-01 15:00:00 345
2021-07-01 16:00:00 313
2021-07-05 08:00:00 3
2021-07-05 09:00:00 31
2021-07-13 08:00:00 76
2021-07-13 09:00:00 34
2021-07-13 10:00:00 94
2021-07-13 11:00:00 55
2021-07-13 12:00:00 43
2021-07-13 13:00:00 423
2021-07-13 14:00:00 231
2021-07-13 15:00:00 23
2021-07-13 16:00:00 563
2021-07-13 17:00:00 424
我正在嘗試獲取事件的日期、開始和結束時間。條件是這樣的:
- 如果時間連續性沒有中斷(如2021-07-13),從
08:00:00到17:00:00是全天事件 - 如果時間連續性中斷并且不像2021-07-13那樣連續,這將是一個不完整的日事件
最終結果是這樣的:
Start End Result
Date
2021-07-01 08:00:00 11:00:00 Incomplete
2021-07-01 15:00:00 16:00:00 Incomplete
2021-07-05 08:00:00 09:00:00 Incomplete
2021-07-13 08:00:00 17:00:00 Full
有沒有一種簡單的方法可以在 Pandas 中執行此操作?
uj5u.com熱心網友回復:
利用:
#if necessary convert to DatetimeIndex
df.index = pd.to_datetime(df.index)
#create column Date
df = df.reset_index()
#test consecutive hours
df['g'] = df['Date'].diff().dt.total_seconds().div(3600).ne(1)
date = df['Date'].dt.date
#created groups
df['g'] = df.groupby(date)['g'].cumsum()
#get minimal and maximal per dates
df1 = (df.groupby([date, 'g'])
.agg(Start=('Date','min'),End=('Date','max'))
.reset_index(level=1, drop=True))
#convert to HH:MM:SS
df1['Start'] = df1['Start'].dt.strftime('%H:%M:%S')
df1['End'] = df1['End'].dt.strftime('%H:%M:%S')
#result column
df1['Result'] = np.where(df1['Start'].eq('08:00:00') &
df1['End'].eq('17:00:00'), 'Full','Incomplete')
print (df1)
Start End Result
Date
2021-07-01 08:00:00 11:00:00 Incomplete
2021-07-01 15:00:00 16:00:00 Incomplete
2021-07-05 08:00:00 09:00:00 Incomplete
2021-07-13 08:00:00 17:00:00 Full
與times 的替代:
df.index = pd.to_datetime(df.index)
df = df.reset_index()
df['g'] = df['Date'].diff().dt.total_seconds().div(3600).ne(1)
date = df['Date'].dt.date
df['g'] = df.groupby(date)['g'].cumsum()
df1 = (df.groupby([date, 'g'])
.agg(Start=('Date','min'),End=('Date','max'))
.reset_index(level=1, drop=True))
df1['Start'] = df1['Start'].dt.time
df1['End'] = df1['End'].dt.time
from datetime import time
df1['Result'] = np.where(df1['Start'].eq(time(8,0,0)) &
df1['End'].eq(time(17,0,0)), 'Full','Incomplete')
print (df1)
Start End Result
Date
2021-07-01 08:00:00 11:00:00 Incomplete
2021-07-01 15:00:00 16:00:00 Incomplete
2021-07-05 08:00:00 09:00:00 Incomplete
2021-07-13 08:00:00 17:00:00 Full
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/390728.html
