我有兩個不同日期的資料框,如下所示:
df1 = pd.DataFrame(index=['2022-01-01 00:37:57', '2022-01-01 03:49:12', '2022-01-01 09:30:11'], columns = ['price'])
df1['price'] = [10,13,12]
df1.index = df1.index.rename('date')
df1:
price
date
2022-01-01 00:37:57 10
2022-01-01 03:49:12 13
2022-01-01 09:30:11 12
df2 = pd.DataFrame(index=['2022-01-01 00:35:00', '2022-01-01 00:47:00', '2022-01-01 00:56:12', '2022-01-01 03:45:00', '2022-01-01 03:50:32',
'2022-01-01 09:29:20', '2022-01-01 09:31:21'], columns=['price'])
df2['price'] = [3000,3210, 2999, 3001, 3027, 3021, 3002]
df2.index = df2.index.rename('date')
df2:
price
date
2022-01-01 00:35:00 3000
2022-01-01 00:47:00 3210
2022-01-01 00:56:12 2999
2022-01-01 03:45:00 3001
2022-01-01 03:50:32 3027
2022-01-01 09:29:20 3021
2022-01-01 09:31:21 3002
我想df1.join(df2,how='left')在小時和最近的分鐘內將 df1 與 df2, 連接在一起,以獲得以下資訊:
df:
price_x price_y
date
2022-01-01 00:37:57 10 3000
2022-01-01 03:49:12 13 3210
2022-01-01 09:30:11 12 3021
例如,最后一行在日期“2022-01-01 09:29:20”加入,因為它最接近“2022-01-01 09:30:11”。
如何才能做到這一點?
uj5u.com熱心網友回復:
嘗試pd.merge_asof()(假設 DateTime 型別的索引并排序):
print(
pd.merge_asof(
df1,
df2,
left_index=True,
right_index=True,
direction="nearest",
)
)
印刷:
price_x price_y
date
2022-01-01 00:37:57 10 3000
2022-01-01 03:49:12 13 3027
2022-01-01 09:30:11 12 3021
uj5u.com熱心網友回復:
Anrej Kesely 給出了很好的回應。我猜 pandas 比我自己更有效。我沒有添加評論來澄清您的問題的聲譽。但是,如果您在 df2 中查找在 df1 中的日期之前出現的最近日期。此代碼將起作用。
import pandas as pd
import numpy as np
from datetime import datetime
df1 = pd.DataFrame(index=['2022-01-01 00:37:57', '2022-01-01 03:49:12', '2022-01-01 09:30:11'], columns = ['price'])
df1['price'] = [10,13,12]
df1.index = df1.index.rename('date')
df1 = df1.reset_index()
df2 = pd.DataFrame(index=['2022-01-01 00:35:00', '2022-01-01 00:47:00', '2022-01-01 00:56:12', '2022-01-01 03:45:00', '2022-01-01 03:50:32',
'2022-01-01 09:29:20', '2022-01-01 09:31:21'], columns=['price'])
df2['price'] = [3000,3210, 2999, 3001, 3027, 3021, 3002]
df2.index = df2.index.rename('date')
df2 = df2.reset_index()
display(df1)
def min_diff(date, df):
min_diff = -18000000
min_index = -1
for i in range(len(df)):
difference = int(datetime.strptime((df['date'][i]),"%Y-%m-%d %H:%M:%S").timestamp()) - int(datetime.strptime(date,"%Y-%m-%d %H:%M:%S").timestamp())
if difference < 0:
if (difference > min_diff):
min_diff = difference
min_index = i
return min_index
print(df2.loc[min_diff(df1['date'][0], df2)])
df1['Price from 2'] = ''
for i in range(len(df1)):
df1.loc[i,'Price from 2'] = df2.loc[min_diff(df1['date'][i], df2),'price']
display(df1)
這將顯示以下內容,
date price Price from 2
0 2022-01-01 00:37:57 10 3000
1 2022-01-01 03:49:12 13 3001
2 2022-01-01 09:30:11 12 3021
如果您只是在尋找最近的日期而不關心方向。@Anrej Kesely 給出了一個很好的答案。希望我們中的任何一個都有幫助!
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/426295.html
