我有以下資料框(示例):
import pandas as pd
data = [['A', '2022-09-01 10:00:00', False, 2], ['A', '2022-09-01 14:00:00', False, 3],
['B', '2022-09-01 13:00:00', False, 1], ['B', '2022-09-01 16:00:00', True, 4]]
df = pd.DataFrame(data = data, columns = ['group', 'date', 'indicator', 'value'])
group date indicator value
0 A 2022-09-01 10:00:00 False 2
1 A 2022-09-01 14:00:00 False 3
2 B 2022-09-01 13:00:00 False 1
3 B 2022-09-01 16:00:00 True 4
我想每小時填寫日期之間的缺失日期。因此,應填充日期之間缺少的每個小時,并且值應與以前的資料相同。這是所需的輸出:
data = [['A', '2022-09-01 10:00:00', False, 2], ['A', '2022-09-01 11:00:00', False, 2],
['A', '2022-09-01 12:00:00', False, 2], ['A', '2022-09-01 13:00:00', False, 2],
['A', '2022-09-01 14:00:00', False, 3],
['B', '2022-09-01 13:00:00', False, 1], ['B', '2022-09-01 14:00:00', False, 1],
['B', '2022-09-01 15:00:00', False, 1], ['B', '2022-09-01 16:00:00', True, 4]]
df_desired = pd.DataFrame(data = data, columns = ['group', 'date', 'indicator', 'value'])
group date indicator value
0 A 2022-09-01 10:00:00 False 2
1 A 2022-09-01 11:00:00 False 2
2 A 2022-09-01 12:00:00 False 2
3 A 2022-09-01 13:00:00 False 2
4 A 2022-09-01 14:00:00 False 3
5 B 2022-09-01 13:00:00 False 1
6 B 2022-09-01 14:00:00 False 1
7 B 2022-09-01 15:00:00 False 1
8 B 2022-09-01 16:00:00 True 4
所以我想知道是否可以使用列值中的前一個值每小時填充每組缺失的日期Pandas?
uj5u.com熱心網友回復:
您可以使用:
df['date'] = pd.to_datetime(df['date'])
out = (df
.groupby('group', as_index=False, group_keys=False)
.apply(lambda g: g.set_index('date')
.reindex(pd.date_range(g['date'].min(),
g['date'].max(),
freq='H'))
.ffill(downcast='infer').reset_index()
)
.reset_index(drop=True)
)
輸出:
index group indicator value
0 2022-09-01 10:00:00 A False 2
1 2022-09-01 11:00:00 A False 2
2 2022-09-01 12:00:00 A False 2
3 2022-09-01 13:00:00 A False 2
4 2022-09-01 14:00:00 A False 3
5 2022-09-01 13:00:00 B False 1
6 2022-09-01 14:00:00 B False 1
7 2022-09-01 15:00:00 B False 1
8 2022-09-01 16:00:00 B True 4
uj5u.com熱心網友回復:
這是另一種方式
df['date']=pd.to_datetime(df['date'])
df2=(df.set_index('date' )
.groupby('group', group_keys=False)
.apply(lambda x: x.resample('1H').ffill())
.reset_index() )
df2
date group indicator value
0 2022-09-01 10:00:00 A False 2
1 2022-09-01 11:00:00 A False 2
2 2022-09-01 12:00:00 A False 2
3 2022-09-01 13:00:00 A False 2
4 2022-09-01 14:00:00 A False 3
5 2022-09-01 13:00:00 B False 1
6 2022-09-01 14:00:00 B False 1
7 2022-09-01 15:00:00 B False 1
8 2022-09-01 16:00:00 B True 4
uj5u.com熱心網友回復:
一種選擇是使用pyjanitor的complete來公開丟失的行:
# pip install pyjanitor
import pandas as pd
import janitor
df['date'] = pd.to_datetime(df['date'])
# build a dictionary to contain the new dates
# the key of the dictionary must exist in the dataframe
new_date = {'date': lambda date: pd.date_range(date.min(), date.max(), freq='H')}
df.complete(new_date, by = 'group').ffill(downcast='infer')
group date indicator value
0 A 2022-09-01 10:00:00 False 2
1 A 2022-09-01 11:00:00 False 2
2 A 2022-09-01 12:00:00 False 2
3 A 2022-09-01 13:00:00 False 2
4 A 2022-09-01 14:00:00 False 3
5 B 2022-09-01 13:00:00 False 1
6 B 2022-09-01 14:00:00 False 1
7 B 2022-09-01 15:00:00 False 1
8 B 2022-09-01 16:00:00 True 4
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/519508.html
