我正在嘗試將 df2 的時間戳與 df1 時間視窗重疊。當沒有匹配項時,我會收到以下錯誤。如何在沒有以下錯誤的情況下獲得輸出?
錯誤
KeyError: "[Timestamp('2022-01-01 03:12:02')] not in index"
輸入
from datetime import datetime, date
import pandas as pd
df1 = pd.DataFrame({'id': ['aa0', 'aa1', 'aa2', 'aa3'],
'number': [1, 2, 2, 1],
'color': ['blue', 'red', 'yellow', "green"],
'date1': [datetime(2022,1,1,1,1,1),
datetime(2022,1,1,2,4,1),
datetime(2022,1,1,3,8,1),
datetime(2022,1,1,4,12,1)],
'date2': [datetime(2022,1,1,2,1,1),
datetime(2022,1,1,3,6,1),
datetime(2022,1,1,3,10,1),
datetime(2022,1,1,4,14,1)] })
輸入2
df2 = pd.DataFrame({'id': ['A', 'B', 'C', 'D'],
'value': [10,20,30,40],
'date': [datetime(2022,1,1,1,12,1),
datetime(2022,1,1,1,40,1),
datetime(2022,1,1,3,12,2),
datetime(2022,1,1,4,12,2)] })
預期產出
(2022-01-01 01:01:01, 2022-01-01 02:01:01] 15.0
(2022-01-01 04:12:01, 2022-01-01 04:14:01] 40.0
代碼
idx = pd.IntervalIndex.from_arrays(pd.to_datetime(df1['date1']),
pd.to_datetime(df1['date2']))
mapper = pd.Series(idx, index=idx)
df2.groupby(mapper[pd.to_datetime(df2['date'])].values)['value'].mean()
uj5u.com熱心網友回復:
一種選擇是使用pyjanitor 的conditional_join ,它可以解決不等式連接,例如:
# pip install pyjanitor
import pandas as pd
import janitor
df1['date1'] = pd.to_datetime(df1['date1'])
df1['date2'] = pd.to_datetime(df1['date2'])
df2['date'] = pd.to_datetime(df2['date'])
(
df1
.filter(like='date')
.conditional_join(
df2.filter(['value', 'date']),
('date1', 'date', '<='),
('date2', 'date', '>='))
.groupby(['date1', 'date2'])
.value
.mean()
)
date1 date2
2022-01-01 01:01:01 2022-01-01 02:01:01 15.0
2022-01-01 04:12:01 2022-01-01 04:14:01 40.0
Name: value, dtype: float64
uj5u.com熱心網友回復:
我想我想通了。這不是最好的,但很有效。
df1['date'] = pd.to_datetime(df1['date1']).dt.date
df2['date'] = pd.to_datetime(df2['dates']).dt.date
df3 = pd.merge(df1, df2, on=['date'], how='left')
mask = (df3['dates'] > df3['date1']) & (df3['dates'] < df3['date2'])
df4 = df3.loc[mask]
df4.groupby(['date1', 'date2'])['value'].mean()
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/516245.html
標籤:熊猫约会时间
