來自以下 python 資料框:
country_ID date ID visit_ENG visit_FRA visit_ESP visit_time
ENG 2022-02-04 16:30:21 3 1 0 0 0 days 01:00:00
ENG 2022-02-04 16:40:21 3 1 0 0 0 days 00:05:00
ENG 2022-02-06 16:35:21 3 1 0 0 1 days 19:55:00
ENG 2022-02-04 10:10:21 2 1 0 0 NaT
ESP 2022-02-04 15:10:21 2 1 1 1 NaT
ENG 2022-02-04 12:35:21 1 1 0 0 NaT
ENG 2022-02-04 16:10:21 1 0 0 0 0 days 03:35:00
ESP 2022-02-04 14:23:21 0 0 0 1 NaT
ESP 2022-02-04 15:27:21 0 1 0 0 0 days 01:04:00
FRA 2022-02-04 15:35:21 0 0 0 0 NaT
ENG 2022-02-04 16:35:21 0 0 0 0 0 days 00:35:00
我使用以下代碼在以下子集中提取了以下資訊:
- 每個 ID 的訪問列總和。
visit_df = pd.DataFrame()
visit_df['visit_ENG'] = df.groupby('ID')['visit_ENG'].sum()
visit_df['visit_FRA'] = df.groupby('ID')['visit_FRA'].sum()
visit_df['visit_ESP'] = df.groupby('ID')['visit_ESP'].sum()
print(visit_df)
visit_ENG visit_FRA visit_ESP
ID
0 1 0 1
1 1 0 0
2 2 1 1
3 3 0 0
- 每個 ID 的每個 country_ID 組的平均值。
subset_avg = pd.to_timedelta(subset_out['visit_time'].dt.total_seconds() \
.groupby([subset_out['ID'], subset_out['country_ID']])
.mean(), unit='s').fillna(pd.Timedelta(days=0)).unstack()
.add_prefix('avg_visit_')
subset_avg.columns.names = ['']
print(subset_avg)
avg_visit_ESP avg_visit_ENG
ID
0 0 days 01:04:00 0 days 00:35:00
1 NaT 0 days 03:35:00
3 NaT 0 days 15:00:00
- 每個 ID 的每個 country_ID 組的標準差。
subset_std = pd.to_timedelta(df['visit_time'].dt.total_seconds() \
.groupby([df['ID'],df['country_ID']])
.std(),unit='s').fillna(pd.Timedelta(days=0)).unstack()
.add_prefix('std_visit_')
subset_std.columns.names = ['']
print(subset_std)
std_visit_ESP std_visit_ENG
ID
0 0 days 00:00:00 0 days 00:00:00
1 NaT 0 days 00:00:00
3 NaT 1 days 01:02:48.342559903
我想用所有提取的資料制作一個聯合表。我需要將上面的所有visit_df,subset_avg, subset_std表NaT(如下所示:
| ID | 訪問_ENG | 訪問_FRA | 訪問_ESP | avg_visit_ESP | avg_visit_ENG | std_visit_ESP | std_visit_ENG |
|---|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 1 | 0 天 01:04:00 | 0 天 00:35:00 | 0 天 00:00:00 | 0 天 00:00:00 |
| 1 | 1 | 0 | 0 | 鈉鹽 | 0 天 03:35:00 | 鈉鹽 | 0 天 00:00:00 |
| 2 | 2 | 1 | 1 | 鈉鹽 | 鈉鹽 | 鈉鹽 | 鈉鹽 |
| 3 | 3 | 0 | 0 | 鈉鹽 | 0 天 15:00:00 | 鈉鹽 | 1 天 01:02:48.342559903 |
如果有一種方法可以構建我要求的解決方案,但使用其他更有效的 python 命令,請告訴我。
uj5u.com熱心網友回復:
您在尋找pd.concat:
>>> pd.concat([visit_df, subset_avg, subset_std], axis=1).reset_index()
ID visit_ENG visit_FRA visit_ESP avg_visit_ESP avg_visit_ENG std_visit_ESP std_visit_ENG
0 0 1 0 1 0 days 01:04:00 0 days 00:35:00 0 days 00:00:00 0 days 00:00:00
1 1 1 0 0 NaT 0 days 03:35:00 NaT 0 days 00:00:00
2 2 2 1 1 NaN NaN NaN NaN
3 3 3 0 0 NaT 0 days 15:00:00 NaT 1 days 01:02:48.342559903
提示visit_id:
visit_df = df.groupby('ID').sum()
print(visit_df)
# Output
visit_ENG visit_FRA visit_ESP
ID
0 1 0 1
1 1 0 0
2 2 1 1
3 3 0 0
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/445142.html
