我正在嘗試創建一個多索引資料幀,其中包含所有可能的索引,即使是當前不包含值的索引。我希望將這些不存在的值設定為 0。為此,我使用了以下內容:
index_levels = ['Channel', 'Duration', 'Designation', 'Manufacturing Class']
grouped_df = df.groupby(by = index_levels)[['Total Purchases', 'Sales', 'Cost']].agg('sum')
grouped_df = grouped_df.reindex(pd.MultiIndex.from_product(grouped_df.index.levels), fill_value = 0)
預期結果:
___________________________________________________________________________________________
|Chan. | Duration | Designation| Manufact. |Total Purchases| Sales | Cost |
|______|____________|____________|______________|_______________|_____________|_____________|
| | Month | Special | Brand | 0 | 0.00 | 0.00 |
| | | |______________|_______________|_____________|_____________|
| | | | Generic | 0 | 0.00 | 0.00 |
|Retail| |____________|______________|_______________|_____________|_____________|
| | |Not Special | Brand | 756 | 15654.07 | 9498.23 |
| | | |______________|_______________|_____________|_____________|
| | | | Generic | 7896 | 98745.23 | 78953.56 |
| |____________|____________|______________|_______________|_____________|_____________|
| | Season | Special | Brand | 0 | 0.00 | 0.00 |
| | | |______________|_______________|_____________|_____________|
| | | | Generic | 0 | 0.00 | 0.00 |
| | |____________|______________|_______________|_____________|_____________|
| | |Not Special | Brand | 0 | 0.00 | 0.00 |
| | | |______________|_______________|_____________|_____________|
| | | | Generic | 0 | 0.00 | 0.00 |
|______|____________|____________|______________|_______________|_____________|_____________|
當至少一個索引級別包含一個值時,會產生此結果。但是,如果索引級別不包含任何值,則會在下面生成以下結果。
___________________________________________________________________________________________
|Chan. | Duration | Designation| Manufact. |Total Purchases| Sales | Cost |
|______|____________|____________|______________|_______________|_____________|_____________|
| | Month | Not Special| Brand | 756 | 15654.07 | 9498.23 |
| | | |______________|_______________|_____________|_____________|
| | | | Generic | 7896 | 98745.23 | 78953.56 |
|Retail|____________|____________|______________|_______________|_____________|_____________|
| | Season |Not Special | Brand | 0 | 0.00 | 0.00 |
| | | |______________|_______________|_____________|_____________|
| | | | Generic | 0 | 0.00 | 0.00 |
|______|____________|____________|______________|_______________|_____________|_____________|
出于某種原因,這些值會繼續被自動截斷。如何修復索引以便始終產生所需的結果,并且我始終可以可靠地使用這些索引進行計算,即使所述索引中沒有值?
uj5u.com熱心網友回復:
您可以做的是預先構建所需的固定索引。例如,基于字典,其中鍵是用作組索引的列標簽,值是所有可能的結果。
index_levels = {
'Channel': ['Retails'],
'Duration': ['Month', 'Season'],
'Designation': ['Special', 'Not Special'],
'Manufacturing Class': ['Brand', 'Generic']
}
fixed_index = pd.MultiIndex.from_product(index_levels.values(), names=index_levels.keys())
然后你可以做
grouped_df = df.groupby(by=index_levels.keys())[['Total Purchases', 'Sales', 'Cost']].agg('sum')
grouped_df = grouped_df.reindex(fixed_index, fill_value=0)
編輯 - 更通用的解決方案
# columns used for grouping
index_levels = ['Channel', 'Duration', 'Designation', 'Manufacturing Class']
# get all the possible (unique) values per index level
index_levels_values = [df[col].unique() for col in index_levels]
# construct the fixed index based on the cartesian product of all the index levels' values
fixed_index = pd.MultiIndex.from_product(index_levels_values, names=index_levels)
grouped_df = df.groupby(by=index_levels)[['Total Purchases', 'Sales', 'Cost']].agg('sum')
grouped_df = grouped_df.reindex(fixed_index, fill_value=0)
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/334469.html
上一篇:以這種方式組合熊貓資料框
下一篇:df.dropna()修改行索引
