A = [1,3,7]
B = [6,4,8]
C = [2, 2, 8]
datetime = ['2022-01-01', '2022-01-02', '2022-01-03']
df1 = pd.DataFrame({'DATETIME':datetime,'A':A,'B':B, 'C':C })
df1.set_index('DATETIME', inplace = True)
df1
A = [1,3,7,6, 8]
B = [3,8,10,5, 8]
C = [5, 7, 9, 6, 5]
datetime = ['2022-03-01', '2022-03-02', '2022-03-03', '2022-03-04', '2022-03-05']
df2 = pd.DataFrame({'DATETIME':datetime,'A':A,'B':B, 'C':C })
df2.set_index('DATETIME', inplace = True)
df2
我想比較 df1 的每一行與 df2 的差異,并為 df1 中的每一行輸出該日期。讓我們取 df1 (2022-01-01) 中的第一行,其中 A=1、B=6 和 C = 2。將其與 df2 2022-03-01 中 A=1、B = 3 和 C = 5 進行比較,我們得到 1-1=0、6-3=3 和 2-5 = 3 的總差,總共 0 3 3=6 個總差。將 2022-01-01 與 df2 的其余部分進行比較,我們發現 2022-03-01 是最小的總差異,并且希望 df1 中的日期。
uj5u.com熱心網友回復:
我假設您想要最小的總絕對差。
最快的方法可能是將 DataFrame 轉換為 numpy 陣列,并使用 numpy 廣播來有效地執行計算。
# for each row of df1 get the (positional) index of the df2 row corresponding to the lowest total absolute difference
min_idx = abs(df1.to_numpy()[:,None] - df2.to_numpy()).sum(axis=-1).argmin(axis=1)
df1['min_diff_date'] = df2.index[min_idx]
輸出:
>>> df1
A B C min_diff_date
DATETIME
2022-01-01 1 6 2 2022-03-01
2022-01-02 3 4 2 2022-03-01
2022-01-03 7 8 8 2022-03-03
腳步:
# Each 'block' corresponds to the absolute difference between a row of df1 and all the rows of df2
>>> abs(df1.to_numpy()[:,None] - df2.to_numpy())
array([[[0, 3, 3],
[2, 2, 5],
[6, 4, 7],
[5, 1, 4],
[7, 2, 3]],
[[2, 1, 3],
[0, 4, 5],
[4, 6, 7],
[3, 1, 4],
[5, 4, 3]],
[[6, 5, 3],
[4, 0, 1],
[0, 2, 1],
[1, 3, 2],
[1, 0, 3]]])
# sum the absolute differences over the columns of each block
>>> abs(df1.to_numpy()[:,None] - df2.to_numpy()).sum(-1)
array([[ 6, 9, 17, 10, 12],
[ 6, 9, 17, 8, 12],
[14, 5, 3, 6, 4]])
# for each row of the previous array get the column index of the lowest value
>>> abs(df1.to_numpy()[:,None] - df2.to_numpy()).sum(-1).argmin(1)
array([0, 0, 2])
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/453150.html
標籤:python-3.x 熊猫
