我試圖弄清楚如何對齊DataFrames存盤在兩個單獨的dictionaries.
為了說明這一點,我創建了四個DataFrames df1, df2, df3, 并將df4它們存盤在dictionaries:
df1:
Dates ID1 ID2 ID3 ID4 ID6
0 2021-01-01 0.0 0.0 0.1 0.0 0.0
1 2021-01-03 0.0 0.0 0.0 0.0 0.0
2 2021-01-04 0.1 0.1 0.0 0.1 0.1
3 2021-01-05 0.2 0.2 0.2 0.2 0.2
4 2021-01-06 0.1 0.1 0.4 0.1 0.1
df2:
Dates ID1 ID2 ID3 ID4 ID6
0 2021-01-01 0.00 0.0 0.1 0.0 0.0
1 2021-01-03 0.20 0.3 0.0 0.0 0.0
2 2021-01-04 0.15 -0.1 0.4 -0.1 -0.1
3 2021-01-05 0.20 -0.2 -0.2 0.3 0.2
4 2021-01-06 0.10 0.1 0.2 -0.1 0.1
df3:
Dates ID2 ID4 ID5 ID6
0 2021-01-01 3 1.0 4 2.0
1 2021-01-02 6 2.4 3 1.5
2 2021-01-03 3 -2.0 3 2.0
3 2021-01-04 -1 1.0 -3 -2.0
4 2021-01-05 3 4.0 4 3.0
df4:
Dates ID2 ID4 ID5 ID6
0 2021-01-02 6 2.4 3 1.5
1 2021-01-03 3 -2.0 3 2.0
2 2021-01-04 -1 1.0 -3 -2.0
3 2021-01-07 3 4.0 4 3.0
為了重現性:
import pandas as pd
df1 = pd.DataFrame({
'Dates':['2021-01-01', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06'],
'ID1':[0,0,0.1,0.2,0.1],
'ID2':[0,0,0.1,0.2,0.1],
'ID3':[0.1,0,0,0.2,0.4],
'ID4':[0,0,0.1,0.2,0.1],
'ID6':[0,0,0.1,0.2,0.1]})
df2 = pd.DataFrame({
'Dates':['2021-01-01', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06'],
'ID1':[0,0.2,0.15,0.2,0.1],
'ID2':[0,0.3,-0.1,-0.2,0.1],
'ID3':[0.1,0,0.4,-0.2,0.2],
'ID4':[0,0,-0.1,0.3,-0.1],
'ID6':[0,0,-0.1,0.2,0.1]})
df3 = pd.DataFrame({
'Dates':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05'],
'ID2':[3,6,3,-1,3],
'ID4':[1,2.4,-2,1,4],
'ID5':[4,3,3,-3,4],
'ID6':[2,1.5,2,-2,3]})
df4 = pd.DataFrame({
'Dates':['2021-01-02', '2021-01-03', '2021-01-04', '2021-01-07'],
'ID2':[6,3,-1,3],
'ID4':[2.4,-2,1,4],
'ID5':[3,3,-3,4],
'ID6':[1.5,2,-2,3]})
dic1 = dict()
dic1['df1'] = df1
dic1['df2'] = df2
dic2 = dict()
dic2['df3'] = df3
dic2['df4'] = df4
我試圖得到的是:
dic1['df1']:
Dates ID2 ID4 ID6
0 2021-01-03 0.0 0.0 0.0
1 2021-01-04 0.1 0.1 0.1
dic1['df2']:
Dates ID2 ID4 ID6
0 2021-01-03 0.3 0.0 0.0
1 2021-01-04 -0.1 -0.1 -0.1
dic2['df3']:
Dates ID2 ID4 ID6
0 2021-01-03 3 -2 2
1 2021-01-04 -1 1 -2
dic2['df4']:
Dates ID2 ID4 ID6
0 2021-01-03 3 -2 2
1 2021-01-04 -1 1 -2
我嘗試對齊兩者的行和列,dictionaries以便我只包含包含在所有DataFrames不同dictionaries. 是否有一個有效的命令來處理它dictionaries?
uj5u.com熱心網友回復:
在兩個s 中DataFrame.align與 createDatetmeIndex一起使用DataFrame:
dic1['df1'], dic2['df2'] = (dic1['df1'].set_index('Dates')
.align(dic2['df2'].set_index('Dates'),
fill_value=0,
join='inner'))
print (dic1['df1'])
ID2 ID4 ID6
Dates
2021-01-01 0.0 0.0 0.0
2021-01-03 0.0 0.0 0.0
2021-01-04 0.1 0.1 0.1
2021-01-05 0.2 0.2 0.2
print (dic2['df2'])
ID2 ID4 ID6
Dates
2021-01-01 3 1.0 2.0
2021-01-03 3 -2.0 2.0
2021-01-04 -1 1.0 -2.0
2021-01-05 3 4.0 3.0
a, b = (dic1['df1'].set_index('Dates')
.align(dic2['df2'].set_index('Dates'),
fill_value=0,
join='inner'))
dic1['df1'], dic2['df2'] = a.reset_index(), b.reset_index()
print (dic1['df1'])
Dates ID2 ID4 ID6
0 2021-01-01 0.0 0.0 0.0
1 2021-01-03 0.0 0.0 0.0
2 2021-01-04 0.1 0.1 0.1
3 2021-01-05 0.2 0.2 0.2
print (dic2['df2'])
Dates ID2 ID4 ID6
0 2021-01-01 3 1.0 2.0
1 2021-01-03 3 -2.0 2.0
2 2021-01-04 -1 1.0 -2.0
3 2021-01-05 3 4.0 3.0
編輯:首先將所有字典合并為一個大字典,然后將Dates 和列名稱提取到嵌套串列中,最后通過DataFrame.reindex以下方式過濾值:
idx, cols = [],[]
for k, v in {**dic1, **dic2}.items():
cols.append(v.drop('Dates', axis=1).columns)
idx.append(v['Dates'])
from functools import reduce
idx = reduce(lambda x, y: set(x) & set(y), idx)
print (idx)
{'2021-01-04', '2021-01-03'}
cols = reduce(lambda x, y: set(x) & set(y), cols)
print (cols)
{'ID4', 'ID2', 'ID6'}
for k, v in dic1.items():
dic1[k] = dic1[k].set_index('Dates').reindex(index=idx, columns=cols, fill_value=0).reset_index()
for k, v in dic2.items():
dic2[k] = dic2[k].set_index('Dates').reindex(index=idx, columns=cols, fill_value=0).reset_index()
uj5u.com熱心網友回復:
您可以計算 df1 和 df2 列和日期的交集,并只保留那些:
df1 = pd.DataFrame({
'Dates':['2021-01-01', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06'],
'ID1':[0,0,0.1,0.2,0.1],
'ID2':[0,0,0.1,0.2,0.1],
'ID3':[0.1,0,0,0.2,0.4],
'ID4':[0,0,0.1,0.2,0.1],
'ID6':[0,0,0.1,0.2,0.1]})
df2 = pd.DataFrame({
'Dates':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05'],
'ID2':[3,6,3,-1,3],
'ID4':[1,2.4,-2,1,4],
'ID5':[4,3,3,-3,4],
'ID6':[2,1.5,2,-2,3]})
cols = set(df1.columns).intersection(set(df2.columns))
dates = set(df1["Dates"]).intersection(set(df2["Dates"]))
dic1 = dict()
dic1['df1'] = df1[df1["Dates"].isin(dates)][cols].reset_index()
dic2 = dict()
dic2['df2'] = df2[df2["Dates"].isin(dates)][cols].reset_index()
print(dic1['df1'])
print(dic2['df2'])
輸出:
index Dates ID6 ID2 ID4
0 0 2021-01-01 0.0 0.0 0.0
1 1 2021-01-03 0.0 0.0 0.0
2 2 2021-01-04 0.1 0.1 0.1
3 3 2021-01-05 0.2 0.2 0.2
index Dates ID6 ID2 ID4
0 0 2021-01-01 2.0 3 1.0
1 2 2021-01-03 2.0 3 -2.0
2 3 2021-01-04 -2.0 -1 1.0
3 4 2021-01-05 3.0 3 4.0
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/386403.html
