我正在嘗試使用一個 excel 檔案來做一些以相當煩人的格式放在一起的事情(我沒有創建它;這是我正在使用的現有資源)。感興趣的值位于稱為(類似)的列中,All_Values由句點分隔,而與這些值對應的度量在單獨的列中指定All_Measures,也由句點分隔,每行不同。例如,使用玩具資料集:
Object All_Measures All_Values (additional columns that are not like this)
1 Height.Weight 20.50 ...
2 Weight.Height 65.30 ...
3 Height.Width.Depth 22.30.10 ...
我想要做的是像這樣重新格式化資料,用 0 填充缺失值(列的最終順序并不重要):
Object Height Weight Width Depth (additional columns)
1 20 50 0 0 ...
2 30 65 0 0 ...
3 22 0 30 10 ...
我可以這樣做的一種方法是(非常緩慢,因為它是一個大資料集)創建一個新的空白資料幀,然后迭代現有資料幀中的每一行,創建一個新的資料幀行,其中的列由 splitting All_Measuresby指定.,然后值指定由分裂All_Values的.。然后,我從行中洗掉All_Measures和All_Values并將新資料幀附加到它的末尾,并將其附加到空白資料幀。但這非常笨拙,如果有一種更快、更優雅的方式來做到這一點就好了。
由于這里沒有錯誤,我沒有 MWE,但是這里有一些代碼可以復制以創建像上面這樣的玩具資料集,以防它派上用場。
df = pd.DataFrame(
columns = ['Object','All_Measures','All_Values','Object_Name']
[[1,'Height.Weight','20.50','First'],
[2,'Weight.Height','65.30','Second'],
[3,'Height.Width.Depth','22.30.10','Third']]
)
uj5u.com熱心網友回復:
使用str.split,explode以及pivot_table:
# split the "All" columns into lists
df['All_Measures'] = df['All_Measures'].str.split('.')
df['All_Values'] = df['All_Values'].str.split('.')
# explode the lists into rows
df = df.explode(['All_Measures', 'All_Values'])
# pivot the measures into columns
df.pivot_table(
index=['Object', 'Object_Name'],
columns='All_Measures',
values='All_Values',
fill_value=0)
輸出:
All_Measures Depth Height Weight Width
Object Object_Name
1 First 0 20 50 0
2 Second 0 30 65 0
3 Third 10 22 0 30
詳細分類
str.split將“所有”列轉換為串列:df['All_Measures'] = df['All_Measures'].str.split('.') df['All_Values'] = df['All_Values'].str.split('.') # Object All_Measures All_Values Object_Name # 0 1 [Height, Weight] [20, 50] First # 1 2 [Weight, Height] [65, 30] Second # 2 3 [Height, Width, Depth] [22, 30, 10] Thirdexplode串列成行:df = df.explode(['All_Measures', 'All_Values']) # Object All_Measures All_Values Object_Name # 0 1 Height 20 First # 0 1 Weight 50 First # 1 2 Weight 65 Second # 1 2 Height 30 Second # 2 3 Height 22 Third # 2 3 Width 30 Third # 2 3 Depth 10 Thirdpivot_table分列的措施:df.pivot_table( index=['Object', 'Object_Name'], columns='All_Measures', values='All_Values', fill_value=0) # All_Measures Depth Height Weight Width # Object Object_Name # 1 First 0 20 50 0 # 2 Second 0 30 65 0 # 3 Third 10 22 0 30
uj5u.com熱心網友回復:
可能有一些方法可以在不使用回圈或 apply() 的情況下完成此操作,但我想不出。以下是我想到的:
import pandas as pd
df = pd.DataFrame(
[[1,'Height.Weight','20.50','First'],
[2,'Weight.Height','65.30','Second'],
[3,'Height.Width.Depth','22.30.10','Third']],
columns = ['Object','All_Measures','All_Values','Object_Name'],
)
def parse_combined_measure(row):
keys = row["All_Measures"].split(".")
values = row["All_Values"].split(".")
return row.append(pd.Series(dict(zip(keys, values))))
df2 = df.apply(parse_combined_measure, axis=1)
df2 = df2.fillna(0)
uj5u.com熱心網友回復:
# Create a new DataFrame with just the values extracted from the All_Values column
In [24]: new_df = df['All_Values'].str.split('.').apply(pd.Series)
Out[24]:
0 1 2
0 20 50 NaN
1 65 30 NaN
2 22 30 10
# Figure out the names those columns should have
In [37]: df.loc[df['All_Measures'].str.count('\.').idxmax(), 'All_Measures']
Out[37]: 'Height.Width.Depth'
In [38]: new_df.columns = df.loc[df['All_Measures'].str.count('\.').idxmax(), 'All_Measures'].split('.')
Out[39]:
Height Width Depth
0 20 50 NaN
1 65 30 NaN
2 22 30 10
# Join the new DF with the original, except the columns we've expanded.
In [41]: df[['Object', 'Object_Name']].join(new_df)
Out[41]:
Object Object_Name Height Width Depth
0 1 First 20 50 NaN
1 2 Second 65 30 NaN
2 3 Third 22 30 10
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/371336.html
