df我有的資料框樣本:
date_code item_code vstore_code
1 2022-03-26 11111 N01
2 2022-03-27 11111 N01
3 2022-03-28 11111 N01
4 2022-03-29 11111 N01
5 2022-03-30 11111 N01
6 2022-03-31 11111 N01
7 2022-04-01 11111 N01
8 2022-04-08 11111 N01
9 2022-04-15 11111 N01
10 2022-04-17 11111 N01
11 2022-04-18 11111 N01
12 2022-04-19 11111 N01
13 2022-04-21 11111 N01
14 2022-04-22 11111 N01
15 2022-04-26 11111 N01
16 2022-02-01 22222 N02
17 2022-02-02 22222 N02
18 2022-02-03 22222 N02
19 2022-02-10 22222 N02
有很多物品和商店。
我想創建單獨的資料框,其中將包含item每個store.
預期輸出:
item_code store_code start_period end_period
11111 N01 2022-03-26 2022-04-02
11111 N01 2022-04-08 2022-04-09
11111 N01 2022-04-15 2022-04-16
11111 N01 2022-04-17 2022-04-20
11111 N01 2022-04-21 2022-04-23
11111 N01 2022-04-26 2022-04-27
22222 N02 2022-02-01 2022-02-04
22222 N02 2022-02-10 2022-02-11
uj5u.com熱心網友回復:
您可以按連續日期時間進行聚合,比較差異由Series.difffor not equal 1 daywithSeries.cumsum和傳遞給groupbywith aggregate minand max,最后添加1 day到end_period列:
df['date_code'] = pd.to_datetime(df['date_code'])
g = df['date_code'].diff().dt.days.ne(1).cumsum()
df = (df.groupby(['item_code','vstore_code',g])
.agg(start_period=('date_code','min'),
end_period=('date_code','max'))
.droplevel(-1)
.reset_index()
.assign(end_period = lambda x: x['end_period'] pd.Timedelta('1 day'))
)
print (df)
0 11111 N01 2022-03-26 2022-04-02
1 11111 N01 2022-04-08 2022-04-09
2 11111 N01 2022-04-15 2022-04-16
3 11111 N01 2022-04-17 2022-04-20
4 11111 N01 2022-04-21 2022-04-23
5 11111 N01 2022-04-26 2022-04-27
6 22222 N02 2022-02-01 2022-02-04
7 22222 N02 2022-02-10 2022-02-11
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/472633.html
