在不丟失索引順序的情況下按列對多索引資料幀進行排序-有解無憂

我有一個df具有這種結構的：

1_product_id        2_product_id        1_qty_sold              2_qty_sold              times_sold
5584                4384                159.00                  653.00                  153
7889                2970                104.00                  497.00                  102
5024                2970                89.00                   413.00                  87
2990                8310                71.00                   283.00                  71
2990                4384                71.00                   282.00                  68
2990                2970                62.00                   240.00                  58
5584                8310                56.00                   208.00                  54

我試圖讓它看起來像這樣：

1_product_id        2_product_id        1_qty_sold              2_qty_sold              times_sold
5584                4384                159.00                  653.00                  153
                    8310                56.00                   208.00                  54
7889                2970                104.00                  497.00                  102
5024                2970                89.00                   413.00                  87
2990                8310                71.00                   283.00                  71
                    4384                71.00                   282.00                  68
                    2970                62.00                   240.00                  58

它是由排序times_sold 和分組1_product_id和2_product_id。我試過：

df_out.groupby(['1_product_id','2_product_id']).sum() \ 
                                               .sort_values('times_sold', ascending = False)

但這弄亂了1_prodct_id索引：

                                            1_qty_despatched        2_qty_despatched    times_sold
1_product_id        2_product_id            
5584                4384                    159.00                  653.00              153
7889                2970                    104.00                  497.00              102
5024                2970                    89.00                   413.00              87
2990                8310                    71.00                   283.00              71
                    4384                    71.00                   282.00              68
                    2970                    62.00                   240.00              58
5584                8310                    56.00                   208.00              54

如何times_sold在1_product_id & 2_product_id不丟失所需結構的情況下對其進行排序和分組？我檢查了這個答案，但它沒有幫助我。

編輯

我試過：

df_out.groupby(['1_product_id','2_product_id']).sum() \ 
                                               .sort_values('times_sold', ascending = False) \
                                               .sort_index(0, ascending = False)

但是索引不是我想要的順序。

                                        1_qty_sold              2_qty_sold              times_sold
1_product_id        2_product_id            
7889                2970                104.00                  497.00                  102 # this should be the 2nd index
5584                4384                159.00                  653.00                  153 # this should be the 1st index
                    8310                56.00                   208.00                  54 
5024                2970                89.00                   413.00                  87 # this should be the 3rd index
2990                4384                71.00                   282.00                  68 # this should be the 4th index
                    2970                62.00                   240.00                  58
                    8310                71.00                   283.00                  71

使用@jezrael解決方案后，我得到：

                                            times_sold  1_qty_sold  2_qty_sold
1_product_id        2_product_id            
5584                4384                    153         159.00      653.00
7889                2970                    102         104.00      497.00
5024                2970                    87          89.00       413.00
2990                8310                    71          71.00       283.00
                    4384                    68          71.00       282.00
                    2970                    58          62.00       240.00
5584                8310                    54          56.00       208.00

雖然我試圖讓它看起來像這樣：

# note that the last row is now the second row and indexes `1_product_id` is unique.

                                            times_sold  1_qty_sold  2_qty_sold
1_product_id        2_product_id            
5584                4384                    153         159.00      653.00
                    8310                    54          56.00       208.00 
7889                2970                    102         104.00      497.00
5024                2970                    87          89.00       413.00
2990                8310                    71          71.00       283.00
                    4384                    68          71.00       282.00
                    2970                    58          62.00       240.00

uj5u.com熱心網友回復：

為了防止分揀groupby加sort=False用字典為原來的順序進行映射：

print (df_out)
   1_product_id  2_product_id  1_qty_sold  2_qty_sold  times_sold
0          5584          4384       159.0       653.0          10 <- changed for test sort
1          7889          2970       104.0       497.0         102
2          5024          2970        89.0       413.0          87
3          2990          8310        71.0       283.0          71
4          2990          4384        71.0       282.0          68
5          2990          2970        62.0       240.0          58
6          5584          8310        56.0       208.0          54


uniq = {v: k for k, v in dict(enumerate(df_out['1_product_id'].unique())).items()}
print(uniq)
{5584: 0, 7889: 1, 5024: 2, 2990: 3}

df = (df_out.groupby(['1_product_id','2_product_id'], sort=False)
            .sum()
            .sort_values(['1_product_id', 'times_sold'],
                          key=lambda x: x.map(uniq).fillna(x), 
                          ascending=[True, False])

            )

print (df)
                           1_qty_sold  2_qty_sold  times_sold
1_product_id 2_product_id                                    
5584         8310                56.0       208.0          54
             4384               159.0       653.0          10
7889         2970               104.0       497.0         102
5024         2970                89.0       413.0          87
2990         8310                71.0       283.0          71
             4384                71.0       282.0          68
             2970                62.0       240.0          58

與另一個替代groupby：

df = (df_out.groupby(['1_product_id','2_product_id'], sort=False)
            .sum()
            .groupby('1_product_id', sort=False, group_keys=False)
            .apply(lambda x: x.sort_values('times_sold', ascending=False))
            )

print (df)
                           1_qty_sold  2_qty_sold  times_sold
1_product_id 2_product_id                                    
5584         8310                56.0       208.0          54
             4384               159.0       653.0          10
7889         2970               104.0       497.0         102
5024         2970                89.0       413.0          87
2990         8310                71.0       283.0          71
             4384                71.0       282.0          68
             2970                62.0       240.0          58

uj5u.com熱心網友回復：

完成處理后，您可以添加

df_out.sort_index(0)

這將解決您的第一個索引排序問題。其他的將已經根據售出時間進行排序。

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/366370.html

標籤：Python 熊猫

上一篇：如何在matplotlib/pandas中以百分比形式制作資料框值的堆積條形圖

下一篇：如何檢查Pandas列值是否作為字典中的鍵出現