給定 Python 中的以下 pandas 資料幀:
| ID | date |
|--------------|---------------------------------------|
| ESP | 2022-03-02 07:24:19 01:00 |
| ESP | 2022-03-02 07:24:19 01:00 |
| ESP | 2022-03-02 08:00:00 01:00 |
| UK | 2022-03-02 08:08:30 01:00 |
| ESP | 2022-03-02 09:11:50 01:00 |
| USA | 2022-03-02 10:19:11 01:00 |
| UK | 2022-03-02 10:12:11 01:00 |
| USA | 2022-03-03 08:33:22 01:00 |
| USA | 2022-03-03 09:23:22 01:00 |
| UK | 2022-03-03 12:13:22 01:00 |
| UK | 2022-03-03 12:35:22 01:00 |
通過在 Python 中實作以下代碼,我得到以下 DataFrame:
def create_dataframe(df):
df['date'] = pd.to_datetime(df['date'].astype(str).str.split(' ').str[0])
string = df['date'].groupby(df['date'].dt.floor('H')).count()
df = pd.DataFrame({'date': string.index.date, 'start_interval': string.index.time,
'end_interval': (string.index pd.DateOffset(hours=1)).time,
'total_rows': string.to_numpy()})
return df
| date | start_interval | end_interval | total_rows |
|-----------------------|-------------------|-------------------|------------|
| 2022-03-02 | 07:00:00 | 08:00:00 | 2 |
| 2022-03-02 | 08:00:00 | 09:00:00 | 2 |
| 2022-03-02 | 09:00:00 | 10:00:00 | 1 |
| 2022-03-02 | 10:00:00 | 11:00:00 | 2 |
| 2022-03-03 | 08:00:00 | 09:00:00 | 1 |
| 2022-03-03 | 09:00:00 | 10:00:00 | 1 |
| 2022-03-03 | 12:00:00 | 13:00:00 | 2 |
我想將“ID”列提供的資訊添加到表中,即獲取此 DataFrame:
| ID | date | start_interval | end_interval | total_rows |
|--------|-----------------------|-------------------|-------------------|------------|
| ESP | 2022-03-02 | 07:00:00 | 08:00:00 | 2 |
| ESP | 2022-03-02 | 08:00:00 | 09:00:00 | 1 |
| UK | 2022-03-02 | 08:00:00 | 09:00:00 | 1 |
| ESP | 2022-03-02 | 09:00:00 | 10:00:00 | 1 |
| USA | 2022-03-02 | 10:00:00 | 11:00:00 | 1 |
| UK | 2022-03-02 | 10:00:00 | 11:00:00 | 1 |
| USA | 2022-03-03 | 08:00:00 | 09:00:00 | 1 |
| USA | 2022-03-03 | 09:00:00 | 10:00:00 | 1 |
| UK | 2022-03-03 | 12:00:00 | 13:00:00 | 2 |
如何修改提供的代碼以獲得結果表?預先感謝您的幫助。
uj5u.com熱心網友回復:
這是否會產生您正在尋找的東西:
result = (
df
.groupby(['ID', df['date'].dt.floor('H')]).agg(total_rows=('date', 'count'))
.reset_index()
.assign(
start_interval=lambda df: df['date'].dt.time,
end_interval=lambda df: (df['date'] pd.Timedelta(hours=1)).dt.time,
date=lambda df: df['date'].dt.date
)
)
結果:
ID date total_rows start_interval end_interval
0 ESP 2022-03-02 2 07:00:00 08:00:00
1 ESP 2022-03-02 1 08:00:00 09:00:00
2 ESP 2022-03-02 1 09:00:00 10:00:00
3 UK 2022-03-02 1 08:00:00 09:00:00
4 UK 2022-03-02 1 10:00:00 11:00:00
5 UK 2022-03-03 2 12:00:00 13:00:00
6 USA 2022-03-02 1 10:00:00 11:00:00
7 USA 2022-03-03 1 08:00:00 09:00:00
8 USA 2022-03-03 1 09:00:00 10:00:00
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/461751.html
