請看下面的代碼。
概述
我有一個包含四列的資料框“df_master”:日期;應從中提取資料的特定 df 的名稱;在指定日期測量;并在指定日期后一年進行測量。
我要做的是撰寫代碼,以便對于 df_master 中的每一行,代碼將測量資料從大量其他 dfs 之一拉入 df_master。每行都有一個不同的 df,它從中提取測量資料。我想在 df_master 中填充兩個新列。一個是從 df_master 中該行中列出的日期從指定 df 中提取的測量資料,另一個是未來一年日期的測量資料。
例子
例如,考慮下面的代碼。df_master 中的第一行是“2016-01-01”和“df_B”。這意味著 df_master 的第一行應該為“Measurement_Today”填充值 250,為“Measurement_One_Year_in_Future”填充 265。
df_master 的下一行需要從 df_C 等中提取測量資料。換句話說,df_master 的每一行都將指定測量資料來自的 df,并且該 df 因行而異。
假設 df_master 必須從數千個 dfs 中提取資料,并且將所有這些數千個 dfs 合并為一個 df 是不可能的。
我試圖撰寫一個 for 回圈,將這些資料拉入但沒有成功。我還嘗試使用 .iloc 來提取未來一年的資料,但這也不起作用。
您能提供的任何幫助將不勝感激。謝謝你。
代碼
# Import dependencies
import pandas as pd
import numpy as np
# Create 'df_A', 'df_B', 'df_C' and 'df_D', which contain measurement data on specific dates.
df_A = pd.DataFrame(np.array([['2016-01-01', 150], ['2017-01-01', 145],
['2018-01-01', 163], ['2019-01-01', 170],
['2020-01-01', 198], ['2021-01-01', 189],]),
columns=['Date', 'Measurement'])
df_A['Date'] = pd.to_datetime(df_A['Date'])
df_B = pd.DataFrame(np.array([['2016-01-01', 250], ['2017-01-01', 265],
['2018-01-01', 221], ['2019-01-01', 285],
['2020-01-01', 298], ['2021-01-01', 289],]),
columns=['Date', 'Measurement'])
df_B['Date'] = pd.to_datetime(df_B['Date'])
df_C = pd.DataFrame(np.array([['2016-01-01', 350], ['2017-01-01', 367],
['2018-01-01', 392], ['2019-01-01', 370],
['2020-01-01', 398], ['2021-01-01', 389],]),
columns=['Date', 'Measurement'])
df_C['Date'] = pd.to_datetime(df_C['Date'])
df_D = pd.DataFrame(np.array([['2016-01-01', 450], ['2017-01-01', 454],
['2018-01-01', 413], ['2019-01-01', 480],
['2020-01-01', 498], ['2021-01-01', 489],]),
columns=['Date', 'Measurement'])
df_D['Date'] = pd.to_datetime(df_D['Date'])
# Create df_master
df_master = pd.DataFrame(np.array([['2016-01-01', 'df_B','','' ], ['2017-01-01', 'df_C','','' ],
['2018-01-01', 'df_B','','' ], ['2019-01-01', 'df_A','','' ],
['2018-01-01', 'df_A','','' ], ['2019-01-01', 'df_D','','' ],]),
columns=['Date', 'df_to_pull_measurement_from', 'Measurement_Today',
'Measurement_Next_Year'])
df_master['Date'] = pd.to_datetime(df_master['Date'])
# Create list of dfs from df_master['df_to_pull_measurement_from'].
list_of_dfs = df_master['df_to_pull_measurement_from']
### THIS DOES NOT WORK ###
# Add columns to df_master for measurement on given date and measurement one year into future.
for row in list_of_dfs:
selected_df = row '[\'Measurement\']'
df_master['Measurement_Today'] = selected_df
df_master['Measurement_Next_Year'] = selected_df.iloc[idx 1]
uj5u.com熱心網友回復:
我認為對字串變數使用全域變數不是一個好主意,最好是創建字典:
#create DatetimeIndex
df_master = df_master.set_index('Date')
for row in list_of_dfs:
selected_df = globals()[row].set_index('Date')['Measurement']
m1 = df_master['df_to_pull_measurement_from'].eq(row)
nexty = selected_df.rename(lambda x: x - pd.offsets.DateOffset(years= 1))
df_master.loc[m1, 'Measurement_Today'] = selected_df
df_master.loc[m1, 'Measurement_Next_Year'] = nexty
#create DatetimeIndex
df_master = df_master.set_index('Date')
#dict of DataFrames
dfs = {'df_A':df_A, 'df_B':df_B,'df_C':df_C, 'df_D':df_D}
#create DatetimeIndex in each DataFrame
dfs1 = {k: v.set_index('Date') for k, v in dfs.items()}
for row in list_of_dfs:
selected_df = dfs1[row]['Measurement']
m1 = df_master['df_to_pull_measurement_from'].eq(row)
#for next years is subtract one year from DatetimeIndex
nexty = selected_df.rename(lambda x: x - pd.offsets.DateOffset(years= 1))
df_master.loc[m1, 'Measurement_Today'] = selected_df
df_master.loc[m1, 'Measurement_Next_Year'] = nexty
print (df_master)
df_to_pull_measurement_from Measurement_Today Measurement_Next_Year
Date
2016-01-01 df_B 250 265
2017-01-01 df_C 367 392
2018-01-01 df_B 221 285
2019-01-01 df_A 170 198
2018-01-01 df_A 163 170
2019-01-01 df_D 480 498
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/426277.html
