如何根據df2的每一行中的值將資料從pythonpandasdf1提取到df2，有點像Excel中嵌套在HLOOKUP中的VLOOKUP-有解無憂

假設我有兩個資料幀 df1 和 df2，描述如下。請參閱下面創建這些 dfs 的代碼。

df1

有 5,000 行和 10,000 列。
第一列包含非連續日期的串列。日期從最舊到最新列出，但并非每天都列出（即，僅列出一些日子）。每個日期都是獨一無二的。
每列都標有不同的人名。每個列名都是唯一的。
除日期列之外的所有列都包含一個數字值。

df2

有 2,000,000 行和 4 列。
第一列包含日期串列。這些不是按最舊到最新排序的。
下一列包含一個人的姓名（在 df1 的一個列中列為列名）。
其他兩列包含基于行中列出的日期的有關該人的資料。

我的目標

我想使用從 df1 中提取的資料填充 df2 中的兩個空白列。
例如，df2 的第一行列出了一個日期 2017-05-15 和一個名為 Person4 的人。我想用 4752 填充 df2['Value_Today']。我想用 4866 填充 df2['Value_2_records_later']。
對于 df2 的下一行（日期為 2019 年 1 月 28 日，人名為 Person1，我想用 1918 填充 df2['Value_Today']。我想用 1912 填充 df2['Value_2_records_later']。
我想對 df2 中的所有 200 萬行執行此操作，因此我認為 for 回圈是個壞主意。

任何幫助將不勝感激。謝謝！

代碼

# Import dependencies
import pandas as pd
import numpy as np

# Create df1 
df1 = pd.DataFrame(np.array([['2016-05-03', 1651,2653,3655,4658,5655], 
                             ['2017-05-29', 1751,2752,3754,4755, 5759], 
                             ['2018-08-22', 1889, 2882,3887, 4884, 5882], 
                             ['2019-06-28', 1966, 2965, 3966, 4960, 5963],
                             ['2018-11-15', 1811, 2811, 3811, 4811, 5811], 
                             ['2019-12-31', 1912, 2912, 3912, 4912, 5912],
                             ['2016-07-05', 1672, 2678, 3679, 4672, 5674], 
                             ['2017-05-15', 1755, 2750, 3759, 4752, 5755], 
                             ['2018-06-10', 1860, 2864, 3866, 4866, 5867], 
                             ['2019-01-28', 1918, 2910, 3914, 4911, 5918],
                             ['2018-11-30', 1812, 2812, 3812, 4812, 5812], 
                             ['2019-01-03', 1915, 2917, 3916, 4916, 5917],]),
                   columns=['Date', 'Person1', 'Person2', 'Person3', 'Person4', 
                            'Person5',])
# Format df1['Date'] col as datetime
df1['Date'] = pd.to_datetime(df1['Date'])
# Sort df1 by 'Date'
df1 = df1.sort_values(['Date'],ascending=[True]).reset_index(drop=True)

# Create 'df2', which contains measurement data on specific dates.
df2 = pd.DataFrame(np.array([['2017-05-15', 'Person4', '', ''], ['2019-01-28    ', 'Person1', '', ''], 
                              ['2018-11-15', 'Person1', '', ''], ['2018-08-22', 'Person3', '', ''],
                              ['2017-05-15', 'Person5', '', ''], ['2016-05-03', 'Person2', '', ''],]),
                   columns=['Date', 'Person', 'Value_Today', 'Value_2_records_later'])
df2['Date'] = pd.to_datetime(df2['Date'])

# Display dfs
display(df1)
display(df2)

### I DON'T KNOW WHAT CODE I NEED TO SOLVE MY ISSUE ###

# To capture the row that is two rows below, I think I would use the '.shift(-2)' function?

uj5u.com熱心網友回復：

解決方案`MultiIndex.map`：

將索引設定df1為Date
堆疊資料框以創建多索引映射系列s1。這個系列的索引將是日期和人名的組合。同樣創建另一個系列s2。
df2設定toDate和Person列的索引
替換使用來自and的值的索引中df2的值，并將相應的結果分配給ands1s2Value_TodayValue_2_records_later

s1 = df1.set_index('Date').stack()
s2 = df1.set_index('Date').shift(-2).stack()
ix = df2.set_index(['Date', 'Person']).index

df2['Value_Today'] = ix.map(s1)
df2['Value_2_records_later'] = ix.map(s2)

結果

print(df2)

        Date   Person Value_Today Value_2_records_later
0 2017-05-15  Person4        4752                  4866
1 2019-01-28  Person1        1918                  1912
2 2018-11-15  Person1        1811                  1915
3 2018-08-22  Person3        3887                  3812
4 2017-05-15  Person5        5755                  5867
5 2016-05-03  Person2        2653                  2750

uj5u.com熱心網友回復：

首先，將值復制一次Value_2_records_later，

step1 = df1.set_index('Date')
persons = step1.columns.tolist()

c1 = [('Value_Today', p) for p in persons]
c2 = [('Value_2_records_later', p) for p in persons]

step1.columns = pd.MultiIndex.from_tuples(c1, names=('','Person'))
step1[c2] = step1[c1].shift(-2)

然后stack將列移動到行

step1.stack()

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/429598.html

標籤：Python 熊猫数据框麻木的

上一篇：Python-根據其他列中的值乘以列，以及位于同一列第一行的值

下一篇：保存和加載串列值？

如何根據df2的每一行中的值將資料從pythonpandasdf1提取到df2，有點像Excel中嵌套在HLOOKUP中的VLOOKUP

解決方案MultiIndex.map：

解決方案`MultiIndex.map`：