我想出了一個問題。我有帶有機器代碼 [M]列的實時生產資料,該列具有日期時間戳 [DateTime] 和累積總和[Cumulative]列。根據原始資料,我通過放置 DateTime 列創建了 15 分鐘的時間間隔。
問題是,我想為每個TimeBins取每臺機器的最新 CumulativeSum ,并取每 15 分鐘的差值。時間間隔。因此,我將能夠計算出 15 分鐘。每臺機器的實時生產資料帶有新的資料框,帶有一個新的列,名為[Diff]。
請找到代表我的問題的示例代碼。
原始資料:
df=pd.DataFrame({'M':['18','18','18','19','19','19','18','18','18','19','19','19','19'],
'Cumulative':['8','10','11','5','8','9','13','16','17','14','19','20','22'],
'DateTime': ['2022-08-01 07:14:28','2022-08-01 07:25:58','2022-08-01 07:29:19',
'2022-08-01 07:13:17','2022-08-01 07:28:58','2022-08-01 07:29:01',
'2022-08-01 07:34:54','2022-08-01 07:36:02','2022-08-01 07:38:17',
'2022-08-01 07:33:46','2022-08-01 07:37:09','2022-08-01 07:38:17','2022-08-01 07:41:38']})
我創建了 15 分鐘。TimeBins 按DateTime 到“15T”桶。并更改每列的格式。
df['DateTime'] = pd.to_datetime(df['DateTime'])
df['TimeBins'] = df['DateTime'].dt.floor(freq='15T')
df['Cumulative'] = df['Cumulative'].astype('int32')
我想要的新資料框如下,
pd.DataFrame({'M':['18','18','18',
'19','19','19'],
'DateTime':['2022-08-01 07:14:28 ','2022-08-01 07:29:19','2022-08-01 07:38:17',
'2022-08-01 07:13:17','2022-08-01 07:29:01','2022-08-01 07:41:38'],
'TimeBins':['2022-08-01 07:00:00','2022-08-01 07:15:00 ','2022-08-01 07:30:00',
'2022-08-01 07:00:00','2022-08-01 07:15:00','2022-08-01 07:30:00'],
'Cumulative':['8','11','17',
'5','9','22'],
'Diff':['NaN','3','6',
'NaN','4','11']})
M DateTime TimeBins Cumulative Diff
0 18 2022-08-01 07:14:28 2022-08-01 07:00:00 8 NaN
1 18 2022-08-01 07:29:19 2022-08-01 07:15:00 11 3
2 18 2022-08-01 07:38:17 2022-08-01 07:30:00 17 6
3 19 2022-08-01 07:13:17 2022-08-01 07:00:00 5 NaN
4 19 2022-08-01 07:29:01 2022-08-01 07:15:00 9 4
5 19 2022-08-01 07:41:38 2022-08-01 07:30:00 22 11
uj5u.com熱心網友回復:
使用GroupBy.last和DataFrameGroupBy.diff:
df['DateTime'] = pd.to_datetime(df['DateTime'])
df['TimeBins'] = df['DateTime'].dt.floor(freq='15T')
df['Cumulative'] = df['Cumulative'].astype('int32')
cols = ['M','DateTime','TimeBins','Cumulative']
df = df.groupby(['M','TimeBins'], as_index=False).last()[cols]
df['Diff'] = df.groupby('M')['Cumulative'].diff()
print (df)
M DateTime TimeBins Cumulative Diff
0 18 2022-08-01 07:14:28 2022-08-01 07:00:00 8 NaN
1 18 2022-08-01 07:29:19 2022-08-01 07:15:00 11 3.0
2 18 2022-08-01 07:38:17 2022-08-01 07:30:00 17 6.0
3 19 2022-08-01 07:13:17 2022-08-01 07:00:00 5 NaN
4 19 2022-08-01 07:29:01 2022-08-01 07:15:00 9 4.0
5 19 2022-08-01 07:41:38 2022-08-01 07:30:00 22 13.0
另一個解決方案Grouper:
df['DateTime'] = pd.to_datetime(df['DateTime'])
df['Cumulative'] = df['Cumulative'].astype('int32')
cols = ['M','DateTime','TimeBins','Cumulative']
df = (df.groupby(['M',pd.Grouper(freq='15T', key='DateTime')])[['DateTime','Cumulative']]
.last().rename_axis(['M','TimeBins'])
.reset_index()[cols])
df['Diff'] = df.groupby('M')['Cumulative'].diff()
print (df)
M DateTime TimeBins Cumulative Diff
0 18 2022-08-01 07:14:28 2022-08-01 07:00:00 8 NaN
1 18 2022-08-01 07:29:19 2022-08-01 07:15:00 11 3.0
2 18 2022-08-01 07:38:17 2022-08-01 07:30:00 17 6.0
3 19 2022-08-01 07:13:17 2022-08-01 07:00:00 5 NaN
4 19 2022-08-01 07:29:01 2022-08-01 07:15:00 9 4.0
5 19 2022-08-01 07:41:38 2022-08-01 07:30:00 22 13.0
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/515945.html
上一篇:groupby后檢查條件
