我需要對maindatausing中的所有值求和master_records。ids即使timestamp這些列有 s 和值,許多值也不會求和。
import pandas as pd
#Proxy reference dataframe
master_records = [['site a', '2021-03-05 02:00:00', '2021-03-05 03:00:00'],
['site a', '2021-03-05 06:00:00', '2021-03-05 08:00:00'],
['site b', '2021-04-08 10:00:00', '2021-04-08 13:00:00']]
mst_df = pd.DataFrame(master_records, columns = ['id', 'start', 'end'])
mst_df['start'] = pd.to_datetime(mst_df['start'], infer_datetime_format=True)
mst_df['end'] = pd.to_datetime(mst_df['end'], infer_datetime_format=True)
#Proxy main high frequency dataframe
main_data = [['id a','2021-03-05 00:00:00', 10], #not aggregated
['id a','2021-03-05 01:00:00', 19], #not aggregated
['id a','2021-03-05 02:00:00', 9],
['id a','2021-03-05 03:00:00', 16],
['id a','2021-03-05 04:00:00', 16], #not aggregated
['id a','2021-03-05 05:00:00', 11], #not aggregated
['id a','2021-03-05 06:00:00', 16],
['id a','2021-03-05 07:00:00', 12],
['id a','2021-03-05 08:00:00', 9],
['id b','2021-04-08 10:00:00', 11],
['id b','2021-04-08 11:00:00', 10],
['id b','2021-04-08 12:00:00', 19],
['id b','2021-04-08 13:00:00', 10],
['id b','2021-04-08 14:00:00', 16]] #not aggregated
# Create the pandas DataFrame
maindata = pd.DataFrame(main_data, columns = ['id', 'timestamp', 'value'])
maindata['timestamp'] = pd.to_datetime(maindata['timestamp'], infer_datetime_format=True)
所需的 DataFrame 如下所示:
print(mst_df)
id start end sum(value)
0 site a 2021-03-05 02:00:00 2021-03-05 03:00:00 25
1 site a 2021-03-05 06:00:00 2021-03-05 08:00:00 37
2 site b 2021-04-08 10:00:00 2021-04-08 13:00:00 50
uj5u.com熱心網友回復:
“id”不匹配;所以首先我們在兩個 DataFrame 中創建一個列來獲取匹配的 ID;然后merge在匹配的“id”上;然后在時間戳介于“開始”和“結束”之間的行上過濾合并的 DataFrame。最后groupby sum將獲取所需的結果:
maindata['id_letter'] = maindata['id'].str.split().str[-1]
mst_df['id_letter'] = mst_df['id'].str.split().str[-1]
merged = mst_df.merge(maindata, on='id_letter', suffixes=('','_'))
out = (merged[merged['timestamp'].between(merged['start'], merged['end'])]
.groupby(['id','start','end'], as_index=False)['value'].sum())
輸出:
id start end value
0 site a 2021-03-05 02:00:00 2021-03-05 03:00:00 25
1 site a 2021-03-05 06:00:00 2021-03-05 08:00:00 37
2 site b 2021-04-08 10:00:00 2021-04-08 13:00:00 50
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/443599.html
標籤:Python 熊猫 数据框 熊猫-groupby 熊猫合并
