我有一個這樣的表,其中包含某個行程的開始時間和結束時間。
| 開始時間 | 時間結束 |
|---|---|
| 2019-07-01 11:25:00 | 2019-07-01 11:40:00 |
| 2019-07-01 21:40:00 | 2019-07-01 22:10:00 |
| 2019-07-03 22:00:00 | 2019-07-04 22:00:00 |
我想在start_time和之間的每一小時獲得end_time屬于該小時的分鐘數。換句話說,我想知道行程在指定時間內運行了多少分鐘end_hours
例如,第一行將回傳如下內容,因為在結束時間 12:00 之前已經過去了 15 分鐘。
| 結束小時 | 總分鐘數 |
|---|---|
| 2019-07-01 12:00:00 | 15 |
同樣,對于第二行,輸出將是
| 結束小時 | 總分鐘數 |
|---|---|
| 2019-07-01 22:00:00 | 20 |
| 2019-07-01 23:00:00 | 10 |
對于最后一行,輸出將是
| 結束小時 | 總分鐘數 |
|---|---|
| 2019-07-03 23:00:00 | 60 |
| 2019-07-03 00:00:00 | 60 |
| 2019-07-04 01:00:00 | 60 |
| ... | ... |
| 2019-07-04 22:00:00 | 60 |
我如何在 Python 中實作這樣的目標?
uj5u.com熱心網友回復:
您可以使用to_datetimePandas 內置函式將日期轉換為日期時間和減法結束 - 開始:
import pandas as pd
df = pd.DataFrame([['2019-07-01 11:25:00','2019-07-01 11:40:00'], ['2019-07-01 21:40:00', '2019-07-01 22:10:00'], ['2019-07-03 22:00:00', '2019-07-04 22:00:00']], columns=['start_time', 'end_time'])
df['total_minutes'] = (pd.to_datetime(df['end_time']) - pd.to_datetime(df['start_time'])).astype('timedelta64[m]')
>>> df
start_time end_time total_minutes
0 2019-07-01 11:25:00 2019-07-01 11:40:00 15.0
1 2019-07-01 21:40:00 2019-07-01 22:10:00 30.0
2 2019-07-03 22:00:00 2019-07-04 22:00:00 1440.0
uj5u.com熱心網友回復:
持續時間具有分鐘精度,因此讓我們向上采樣到該頻率,并計算在 start_time - end_time 間隔之一內的每小時分鐘數。
import pandas as pd
df = pd.DataFrame(
{"start_time": ["2019-07-01 11:25:00", "2019-07-01 21:40:00", "2019-07-03 22:00:00"],
"end_time": ["2019-07-01 11:40:00", "2019-07-01 22:10:00", "2019-07-04 22:00:00"]}
)
df['start_time'] = pd.to_datetime(df['start_time'])
df['end_time'] = pd.to_datetime(df['end_time'])
df['minutes'] = (df['end_time'] - df['start_time']).dt.total_seconds()/60
# create an IntervalIndex which we can set as the axis (needed for re-indexing).
# subtract one minute from end_time so that the minute of the termination is excluded.
iv_idx = pd.IntervalIndex.from_arrays(df['start_time'],
df['end_time']-pd.Timedelta(minutes=1),
closed='both')
# create a new index with the extended frequency:
new_idx = pd.date_range(df['start_time'].min(), df['end_time'].max(), freq='min')
# set the new index to get the extended frequency;
# all minutes will have the value of the whole interval
result = df['minutes'].set_axis(iv_idx).reindex(new_idx)
# we can now calculate the duration per hour by resampling and summing the
# boolean representation of the duration (1/0):
result= result.fillna(0).astype(int).astype(bool).resample('H').sum()
result.index.name = 'start_hour'
現在您已將結果錨定到 start_hour(您可以通過將索引移動一小時來輕松更改為結束小時):
print(result.loc["2019-07-01 11:00:00":"2019-07-01 12:00:00"])
# start_hour
# 2019-07-01 11:00:00 15
# 2019-07-01 12:00:00 0
# Freq: H, Name: minutes, dtype: int64
print(result.loc["2019-07-01 20:00:00":"2019-07-01 23:00:00"])
# start_hour
# 2019-07-01 20:00:00 0
# 2019-07-01 21:00:00 20
# 2019-07-01 22:00:00 10
# 2019-07-01 23:00:00 0
# Freq: H, Name: minutes, dtype: int64
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/356604.html
