給定 Python 中的以下 pandas 資料幀:
| ID | date |
|--------------|---------------------------------------|
| 2 | 2022-03-02 07:24:19 01:00 |
| 2 | 2022-03-02 07:24:19 01:00 |
| 0 | 2022-03-02 08:00:00 01:00 |
| 0 | 2022-03-02 08:08:30 01:00 |
| 1 | 2022-03-02 09:11:50 01:00 |
| 1 | 2022-03-02 10:19:11 01:00 |
| 1 | 2022-03-02 10:12:11 01:00 |
| 3 | 2022-03-03 08:33:22 01:00 |
| 3 | 2022-03-03 09:23:22 01:00 |
| 3 | 2022-03-03 12:13:22 01:00 |
| 3 | 2022-03-03 12:35:22 01:00 |
我需要創建一個新的 DataFrame,其中包含由引數指定的給定時間間隔內每天的總行數。對于這個例子,我們假設 1 小時。我要獲取的 DataFrame 示例:
| date | start_interval | end_interval | total_rows |
|-----------------------|-------------------|-------------------|------------|
| 2022-03-02 | 00:00:00 | 01:00:00 | 0 |
| 2022-03-02 | 01:00:00 | 02:00:00 | 0 |
| 2022-03-02 | 02:00:00 | 03:00:00 | 0 |
| 2022-03-02 | 03:00:00 | 04:00:00 | 0 |
| 2022-03-02 | 04:00:00 | 05:00:00 | 0 |
| 2022-03-02 | 05:00:00 | 06:00:00 | 0 |
| 2022-03-02 | 06:00:00 | 07:00:00 | 0 |
| 2022-03-02 | 07:00:00 | 08:00:00 | 2 |
| 2022-03-02 | 08:00:00 | 09:00:00 | 2 |
| 2022-03-02 | 09:00:00 | 10:00:00 | 1 |
| 2022-03-02 | 10:00:00 | 11:00:00 | 2 |
| 2022-03-02 | 11:00:00 | 12:00:00 | 0 |
| 2022-03-02 | 12:00:00 | 13:00:00 | 0 |
| 2022-03-02 | 13:00:00 | 14:00:00 | 0 |
| 2022-03-02 | 14:00:00 | 15:00:00 | 0 |
| 2022-03-02 | 15:00:00 | 16:00:00 | 0 |
| 2022-03-02 | 16:00:00 | 17:00:00 | 0 |
| 2022-03-02 | 17:00:00 | 18:00:00 | 0 |
| 2022-03-02 | 18:00:00 | 19:00:00 | 0 |
| 2022-03-02 | 19:00:00 | 20:00:00 | 0 |
| 2022-03-02 | 20:00:00 | 21:00:00 | 0 |
| 2022-03-02 | 21:00:00 | 22:00:00 | 0 |
| 2022-03-02 | 22:00:00 | 23:00:00 | 0 |
| 2022-03-02 | 23:00:00 | 00:00:00 | 0 |
| 2022-03-03 | 00:00:00 | 01:00:00 | 0 |
| 2022-03-03 | 01:00:00 | 02:00:00 | 0 |
| 2022-03-03 | 02:00:00 | 03:00:00 | 0 |
| 2022-03-03 | 03:00:00 | 04:00:00 | 0 |
| 2022-03-03 | 04:00:00 | 05:00:00 | 0 |
| 2022-03-03 | 05:00:00 | 06:00:00 | 0 |
| 2022-03-03 | 06:00:00 | 07:00:00 | 0 |
| 2022-03-03 | 07:00:00 | 08:00:00 | 0 |
| 2022-03-03 | 08:00:00 | 09:00:00 | 1 |
| 2022-03-03 | 09:00:00 | 10:00:00 | 1 |
| 2022-03-03 | 10:00:00 | 11:00:00 | 0 |
| 2022-03-03 | 11:00:00 | 12:00:00 | 0 |
| 2022-03-03 | 12:00:00 | 13:00:00 | 2 |
| 2022-03-03 | 13:00:00 | 14:00:00 | 0 |
| 2022-03-03 | 14:00:00 | 15:00:00 | 0 |
| 2022-03-03 | 15:00:00 | 16:00:00 | 0 |
| 2022-03-03 | 16:00:00 | 17:00:00 | 0 |
| 2022-03-03 | 17:00:00 | 18:00:00 | 0 |
| 2022-03-03 | 18:00:00 | 19:00:00 | 0 |
| 2022-03-03 | 19:00:00 | 20:00:00 | 0 |
| 2022-03-03 | 20:00:00 | 21:00:00 | 0 |
| 2022-03-03 | 21:00:00 | 22:00:00 | 0 |
| 2022-03-03 | 22:00:00 | 23:00:00 | 0 |
| 2022-03-03 | 23:00:00 | 00:00:00 | 0 |
我的想法是最終洗掉 total_rows 列中包含 0 的所有行。
df= df[df['total_rows'] != 0]
| date | start_interval | end_interval | total_rows |
|-----------------------|-------------------|-------------------|------------|
| 2022-03-02 | 07:00:00 | 08:00:00 | 2 |
| 2022-03-02 | 08:00:00 | 09:00:00 | 2 |
| 2022-03-02 | 09:00:00 | 10:00:00 | 1 |
| 2022-03-02 | 10:00:00 | 11:00:00 | 2 |
| 2022-03-03 | 08:00:00 | 09:00:00 | 1 |
| 2022-03-03 | 09:00:00 | 10:00:00 | 1 |
| 2022-03-03 | 12:00:00 | 13:00:00 | 2 |
我怎么能得到這個結果。提前謝謝你的幫助。
uj5u.com熱心網友回復:
將您的列放在地板date上,然后計算出現次數:
s = df['date'].groupby(df['date'].dt.floor('H')).count()
out = pd.DataFrame({'date': s.index.date, 'start_interval': s.index.time,
'end_interval': (s.index pd.DateOffset(hours=1)).time,
'total_rows': s.to_numpy()})
print(out)
# Output
date start_interval end_interval total_rows
0 2022-03-02 07:00:00 08:00:00 2
1 2022-03-02 08:00:00 09:00:00 2
2 2022-03-02 09:00:00 10:00:00 1
3 2022-03-02 10:00:00 11:00:00 2
4 2022-03-03 08:00:00 09:00:00 1
5 2022-03-03 09:00:00 10:00:00 1
6 2022-03-03 12:00:00 13:00:00 2
uj5u.com熱心網友回復:
這是一份不錯的作業pd.Grouper:
z = df.groupby(
pd.Grouper(freq='1h', key='date')
).size().to_frame('total_rows').reset_index()
out = z.assign(
start_interval=z['date'].dt.time,
end_interval=(z['date'] pd.Timedelta(1, 'hour')).dt.time,
date=z['date'].dt.normalize(),
)
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/459834.html
