假設下一個打包的 df,其中column_name表示未打包row_name資料幀中的列名和行名:
column_name row_name value
0 col1 a 1
1 col1 b 2
2 col1 c 3
3 col2 d 4
4 col2 e 5
5 col2 f 6
6 col2 g 7
7 col2 h 8
解壓后的資料框如下所示:
individual col1 col2
0 1 a e
1 2 b f
2 3 c g
3 4 a h
4 5 b e
5 6 c f
6 7 a g
7 8 b h
所需的輸出可能如下所示:
individual col1 col2 value_col1 value_col2
1 a e 1 5
2 b f 2 6
3 c g 3 7
4 a h 1 8
5 b e 2 5
6 c f 3 6
7 a g 1 7
8 b h 2 8
有沒有辦法進行連接以便將value列解壓縮為解壓縮資料框中的兩個不同列?
可重現的資料:
unpacked_df = pd.DataFrame({
'individual': [1,2,3,4,5,6,7,8],
'col1':['a','b','c','a','b','c','a','b'],
'col2':['e','f','g','h','e','f','g','h']
})
packed_df = pd.DataFrame({
'column_name': ['col1','col1','col1','col2','col2','col2','col2','col2'],
'row_name': ['a','b','c','d','e','f','g','h'],
'value':[1,2,3,4,5,6,7,8]
})
uj5u.com熱心網友回復:
你可以做一個 double merge,首先 for col1,然后 for col2:
unpacked_df.merge(packed_df[packed_df.column_name.eq("col1")], left_on="col1", right_on="row_name").merge(packed_df[packed_df.column_name.eq("col2")], left_on="col2", right_on="row_name", suffixes=("_col1","_col2")).sort_values("individual")[["individual", "col1", "col2", "value_col1", "value_col2"]]
結果:
individual col1 col2 value_col1 value_col2
0 1 a e 1 5
6 2 b f 2 6
5 3 c g 3 7
2 4 a h 1 8
1 5 b e 2 5
7 6 c f 3 6
4 7 a g 1 7
3 8 b h 2 8
uj5u.com熱心網友回復:
這有效:
cols = packed_df['column_name'].unique()
dct = {k: [] for k in cols}
for row in unpacked_df.iloc:
for col in cols:
dct[col].append(packed_df[(packed_df['column_name'] == col) & (packed_df['row_name'] == row[col])]['value'].iloc[0])
uj5u.com熱心網友回復:
一種選擇是融化已打包的 df,將其加入未打包的 df,進行樞軸/翻轉和一些列名稱調整:
temp = (unpacked_df
.melt('individual',
var_name ='column_name',
value_name='row_name')
.merge(packed_df,
on=['column_name', 'row_name'])
.pivot('individual', 'column_name')
.rename(columns={'row_name':''})
)
# collapse the columns
temp.columns = temp.columns.map('_'.join)
# removeprefix is a python3.9 addition
temp.rename(columns=lambda col: col.removeprefix('_'))
col1 col2 value_col1 value_col2
individual
1 a e 1 5
2 b f 2 6
3 c g 3 7
4 a h 1 8
5 b e 2 5
6 c f 3 6
7 a g 1 7
8 b h 2 8
其目的是得到的配對individual和的組合col1和col2,以便它反射鏡相同的形式在packed_df; 從那里開始,它只是合并和重塑(只要索引/列組合是唯一的,重塑就起作用)。
另一種選擇,這可能與方法鏈(長到寬的部分)幫助使用pivot_wider從功能pyjanitor:
pip install git https://github.com/pyjanitor-devs/pyjanitor.git
import janitor
import pandas as pd
(unpacked_df
.melt('individual',
var_name ='column_name',
value_name='row_name')
.merge(packed_df,
on=['column_name', 'row_name'])
# wrapper around pivot
.pivot_wider(index = 'individual',
names_from = 'column_name',
names_sep = '_',
names_glue = lambda col: f"{col[-4:]}"
if col[:-5]=="row_name"
else col)
)
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/374326.html
上一篇:mysqlgroupby并加入
