有沒有辦法從熊貓資料框中洗掉自動截斷？-有解無憂

我正在嘗試創建一個多索引資料幀，其中包含所有可能的索引，即使是當前不包含值的索引。我希望將這些不存在的值設定為 0。為此，我使用了以下內容：

index_levels = ['Channel', 'Duration', 'Designation', 'Manufacturing Class']

grouped_df = df.groupby(by = index_levels)[['Total Purchases', 'Sales', 'Cost']].agg('sum')

grouped_df = grouped_df.reindex(pd.MultiIndex.from_product(grouped_df.index.levels), fill_value = 0)

預期結果：

 ___________________________________________________________________________________________ 
|Chan. | Duration   | Designation|    Manufact. |Total Purchases|  Sales      |   Cost      |
|______|____________|____________|______________|_______________|_____________|_____________|
|      | Month      | Special    |    Brand     |     0         |    0.00     |   0.00      |
|      |            |            |______________|_______________|_____________|_____________|
|      |            |            |    Generic   |     0         |    0.00     |   0.00      |
|Retail|            |____________|______________|_______________|_____________|_____________|
|      |            |Not Special |    Brand     |     756       | 15654.07    |   9498.23   |
|      |            |            |______________|_______________|_____________|_____________|
|      |            |            |    Generic   |     7896      |  98745.23   |    78953.56 |
|      |____________|____________|______________|_______________|_____________|_____________|
|      | Season     | Special    |    Brand     |     0         |  0.00       |    0.00     |
|      |            |            |______________|_______________|_____________|_____________|
|      |            |            |    Generic   |     0         |  0.00       |    0.00     |
|      |            |____________|______________|_______________|_____________|_____________|
|      |            |Not Special |    Brand     |     0         |  0.00       |    0.00     |
|      |            |            |______________|_______________|_____________|_____________|
|      |            |            |    Generic   |     0         |  0.00       |    0.00     |
|______|____________|____________|______________|_______________|_____________|_____________|

當至少一個索引級別包含一個值時，會產生此結果。但是，如果索引級別不包含任何值，則會在下面生成以下結果。

___________________________________________________________________________________________ 
|Chan. | Duration   | Designation|    Manufact. |Total Purchases|  Sales      |   Cost      |
|______|____________|____________|______________|_______________|_____________|_____________|
|      | Month      | Not Special|    Brand     |     756       |  15654.07   |   9498.23   |
|      |            |            |______________|_______________|_____________|_____________|
|      |            |            |    Generic   |    7896       | 98745.23    |   78953.56  |
|Retail|____________|____________|______________|_______________|_____________|_____________|
|      | Season     |Not Special |    Brand     |       0       |    0.00     |     0.00    |
|      |            |            |______________|_______________|_____________|_____________|
|      |            |            |    Generic   |       0       |    0.00     |     0.00    |
|______|____________|____________|______________|_______________|_____________|_____________|

出于某種原因，這些值會繼續被自動截斷。如何修復索引以便始終產生所需的結果，并且我始終可以可靠地使用這些索引進行計算，即使所述索引中沒有值？

uj5u.com熱心網友回復：

您可以做的是預先構建所需的固定索引。例如，基于字典，其中鍵是用作組索引的列標簽，值是所有可能的結果。

index_levels = {
    'Channel': ['Retails'], 
    'Duration': ['Month', 'Season'], 
    'Designation': ['Special', 'Not Special'], 
    'Manufacturing Class': ['Brand', 'Generic']
}

fixed_index = pd.MultiIndex.from_product(index_levels.values(), names=index_levels.keys())

然后你可以做

grouped_df = df.groupby(by=index_levels.keys())[['Total Purchases', 'Sales', 'Cost']].agg('sum')

grouped_df = grouped_df.reindex(fixed_index, fill_value=0)

編輯 - 更通用的解決方案

# columns used for grouping  
index_levels = ['Channel', 'Duration', 'Designation', 'Manufacturing Class']

# get all the possible (unique) values per index level
index_levels_values = [df[col].unique() for col in index_levels]

# construct the fixed index based on the cartesian product of all the index levels' values
fixed_index = pd.MultiIndex.from_product(index_levels_values, names=index_levels)

grouped_df = df.groupby(by=index_levels)[['Total Purchases', 'Sales', 'Cost']].agg('sum')

grouped_df = grouped_df.reindex(fixed_index, fill_value=0)

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/334469.html

標籤：Python 熊猫多指标

上一篇：以這種方式組合熊貓資料框

下一篇：df.dropna()修改行索引