我有一個資料框架:
data = {'process'/span>: ['buying','selling','searchhicng','repaired', 'prepare', 'selling','buying', 'search', 'selling','search'] 。
'type': ['in_progress','in_progress','end','in_progress','end'。'in_progress','in_progress', 'end', 'in_progress','end'】。]
'country': ['usa',np.nan, 'usa','ghana', np. nan,'end','portugal', np.nan, np.nan,'England'] 。
'id': ['022','022','022', '011','011', '011','011', '011','011']。
'lag': ['00:00:10.042721','00:00:00. 042721','00:00:05.042721','00:10:00.042721','00:00:00. 042721','00:00:00.042721','00:00:50.042721','00:00:00. 042721','00:00:00.042721','00:00:00.042721'],
'created'。['2021-07-01','2021-07-02', '2021-07-03','2021-07-04','2021-07-05', '2021-07-06','2021-07-06', '2021-07-08','2021-07-09','2021-07-10'],
'next_created': ['2021-07-01','2021-07-02', '2021-07-03','2021-07-04','2021-07-05', '2021-07-06','2021-07-07','2021-07-08','2021-07-09','2021-07-10']
}
df = pd. DataFrame(data, columns = ['process','type','country', 'id','lag','created','Next_created'] )
我需要通過id的每一組的process列來連接連續的行,這些行的lag小于one second,將第一行的值寫入created,而created_next是最后一行的值。
誰能看出問題所在,我不明白在這種情況下我怎么能使用groupby。
我想我需要使用cumsum(),但是我不知道我可以在??的地方使用什么
df['lag'].shift(1).??.cumsum()
輸出結果
uj5u.com熱心網友回復:
你可以試試:
# Convert to timedelta to facilitate checking of within one second.
df['lag'] = pd.to_timedelta(df['lag'] )
# 按`lag`差異小于一秒的同一`id`進行分組。
group = df['lag'].diff().abs().gt(np. timedelta64(1, 's')).groupby(df['id']).cumsum()
# group by `id` and newly created grouping and then aggregate.
(df.groupby(['id', group], as_index=False, sort=False)
.agg({'process': lambda x: ''.join(x), # concatenate consecutive rows within total grouping[/span]。
'type'。'first',
'country': lambda x: x.iloc[0], # get first entry including `NaN`
'id': 'first',
'lag': 'first',
'created': 'first', # get first entry .
'next_created': 'last' # get last entry
結果:
process type country id lag created next_created
0 buying in_progress usa 022 0 days 00:00:10. 042721 2021-07-01 2021-07-01
1 selling in_progress NaN 022 0 days 00:00:00. 042721 202107-02 202107-02
2 searhicng end usa 022 0 days 00:00:05. 042721 202107-03 202107-03
3 repairing in_progress ghana 011 0 days 00:10:00. 042721 202107-04 202107-04
4準備銷售結束 NaN 011 0天 00:00:00。 042721 2021-07-05 2021-07-06
5 buying in_progress portugal 011 0 days 00:00:50。 042721 2021-07-06 2021-07-07
6 搜索賣出搜索結束 NaN 011 0 天 00:00:00。 042721 2021-07-08 2021-07-10
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/326411.html
標籤:
上一篇:在pandas列中尋找某個短語


