我有這個 Pandas 資料框
datetime machineID errorID
0 2015-01-06 03:00:00 1 error3
1 2015-02-03 06:00:00 1 error4
2 2015-02-21 11:00:00 1 error1
3 2015-02-21 16:00:00 1 error2
4 2015-03-20 06:00:00 1 error1
5 2015-04-04 06:00:00 1 error5
6 2015-05-04 06:00:00 1 error4
7 2015-05-19 06:00:00 1 error2
8 2015-05-19 06:00:00 1 error3
9 2015-06-03 06:00:00 1 error5
現在我想解開 errorID 以便我可以根據error1, error2...獲取列error5。所以為此我在 Pandas 中使用了 groupby 和 unstack 方法
a = errors.groupby(['machineID', 'datetime', 'errorID']).size().unstack('errorID', fill_value=0)
這給了我這個資料框
errorID error1 error2 error3 error4 error5
machineID datetime
1 2015-01-06 03:00:00 0 0 1 0 0
2015-02-03 06:00:00 0 0 0 1 0
2015-02-21 11:00:00 1 0 0 0 0
2015-02-21 16:00:00 0 1 0 0 0
2015-03-20 06:00:00 1 0 0 0 0
現在我想根據24H頻率和datetime. 但是當我使用該resample功能時,它給了我錯誤KeyError: 'The grouper name datetime is not found'
a.resample('24H', on='datetime').agg({'error1':'mean','error2':'mean','error3':'mean','error4':'mean', 'error5':'mean'}).rename(columns={'error1':'error1_mean','error2' : 'error2_mean', 'error3': 'error3_mean', 'error4': 'error4_24mean','error5': 'error5_24mean'})
當我列出此資料框中的所有列時,它僅顯示 ['error1', 'error2', 'error3', 'error4', 'error5']
這是整個錯誤
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-127-607c418305a0> in <module>
----> 1 a.resample('24H', on='datetime').agg({'error1':'mean','error2':'mean','error3':'mean','error4':'mean', 'error5':'mean'}).rename(columns={'error1':'error1_mean','error2' : 'error2_mean', 'error3': 'error3_mean', 'error4': 'error4_24mean','error5': 'error5_24mean'})
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/pandas/core/generic.py in resample(self, rule, how, axis, fill_method, closed, label, convention, kind, loffset, limit, base, on, level)
8447 base=base,
8448 key=on,
-> 8449 level=level,
8450 )
8451 return _maybe_process_deprecations(
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/pandas/core/resample.py in resample(obj, kind, **kwds)
1304 """
1305 tg = TimeGrouper(**kwds)
-> 1306 return tg._get_resampler(obj, kind=kind)
1307
1308
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/pandas/core/resample.py in _get_resampler(self, obj, kind)
1428
1429 """
-> 1430 self._set_grouper(obj)
1431
1432 ax = self.ax
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/pandas/core/groupby/grouper.py in _set_grouper(self, obj, sort)
171 else:
172 if key not in obj._info_axis:
--> 173 raise KeyError("The grouper name {0} is not found".format(key))
174 ax = Index(obj[key], name=key)
175
KeyError: 'The grouper name datetime is not found'
我不知道如何在 groupby 后使用重新采樣
uj5u.com熱心網友回復:
首先將值轉換為日期時間:
errors['datetime'] = pd.to_datetime(errors['datetime'])
a = errors.groupby(['machineID', 'datetime', 'errorID']).size().unstack('errorID', fill_value=0)
然后,如果resample每次machineID使用需要:
a = a.reset_index(level=0).groupby('machineID').resample('24H').agg({'error1':'mean','error2':'mean','error3':'mean','error4':'mean', 'error5':'mean'}).rename(columns={'error1':'error1_mean','error2' : 'error2_mean', 'error3': 'error3_mean', 'error4': 'error4_24mean','error5': 'error5_24mean'})
或者如果只需要resample使用:
a = a.reset_index(level=0).resample('24H').agg({'error1':'mean','error2':'mean','error3':'mean','error4':'mean', 'error5':'mean'}).rename(columns={'error1':'error1_mean','error2' : 'error2_mean', 'error3': 'error3_mean', 'error4': 'error4_24mean','error5': 'error5_24mean'})
或者如果需要Grouper使用groupby :
a = a.groupby(['machineID', pd.Grouper(freq='24H', level='datetime')]).agg({'error1':'mean','error2':'mean','error3':'mean','error4':'mean', 'error5':'mean'}).rename(columns={'error1':'error1_mean','error2' : 'error2_mean', 'error3': 'error3_mean', 'error4': 'error4_24mean','error5': 'error5_24mean'})
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/382534.html
上一篇:將浮點數轉換為日期時間格式
下一篇:全年特定時間的日期時間查詢
