我有一個看起來像這樣的資料框:
S.No date origin dest journeytype
1 2021-10-21 FKG HYM OP
2 2021-10-21 FKG HYM PK
3 2021-10-21 HYM LDS OP
4 2021-10-22 FKG HYM OP
5 2021-10-22 FKG HYM PK
6 2021-10-22 HYM LDS OP
7 2021-10-23 FKG HYM OP
8 2021-10-24 AVM BLA OP
9 2021-10-24 AVM DBL OP
10 2021-10-27 AVM BLA OP
我需要將單獨的起點、目的地和旅程型別拆分為單獨的開始和結束日期列。
上述輸入的輸出資料幀應如下所示:
start_date end_date origin dest journeytype
2021-10-21 2021-10-23 FKG HYM OP
2021-10-21 2021-10-22 FKG HYM PK
2021-10-21 2021-10-22 HYM LDS OP
2021-10-24 2021-10-24 AVM BLA OP
2021-10-24 2021-10-24 AVM DBL OP
2021-10-27 2021-10-27 AVM BLA OP
此外,如果任何組的日期不連續,則它們需要在結果中顯示為單獨的記錄
uj5u.com熱心網友回復:
如有必要,將列轉換為日期時間,然后按串列聚合min和max按GroupBy.agg列的最后更改順序:
df['date'] = pd.to_datetime(df['date'])
df = (df.groupby(['origin','dest','journeytype'], sort=False)['date']
.agg(start_date='min', end_date='max')
.reset_index())
df = df[['start_date', 'end_date','origin', 'dest', 'journeytype']]
print (df)
start_date end_date origin dest journeytype
0 2021-10-21 2021-10-23 FKG HYM OP
1 2021-10-21 2021-10-22 FKG HYM PK
2 2021-10-21 2021-10-22 HYM LDS OP
3 2021-10-24 2021-10-24 AVM BLA OP
4 2021-10-24 2021-10-24 AVM DBL OP
5 2021-10-24 2021-10-24 AVM DKD OP
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/331192.html
