我有一個df具有這種結構的:
1_product_id 2_product_id 1_qty_sold 2_qty_sold times_sold
5584 4384 159.00 653.00 153
7889 2970 104.00 497.00 102
5024 2970 89.00 413.00 87
2990 8310 71.00 283.00 71
2990 4384 71.00 282.00 68
2990 2970 62.00 240.00 58
5584 8310 56.00 208.00 54
我試圖讓它看起來像這樣:
1_product_id 2_product_id 1_qty_sold 2_qty_sold times_sold
5584 4384 159.00 653.00 153
8310 56.00 208.00 54
7889 2970 104.00 497.00 102
5024 2970 89.00 413.00 87
2990 8310 71.00 283.00 71
4384 71.00 282.00 68
2970 62.00 240.00 58
它是由排序times_sold 和分組1_product_id和2_product_id。我試過:
df_out.groupby(['1_product_id','2_product_id']).sum() \
.sort_values('times_sold', ascending = False)
但這弄亂了1_prodct_id索引:
1_qty_despatched 2_qty_despatched times_sold
1_product_id 2_product_id
5584 4384 159.00 653.00 153
7889 2970 104.00 497.00 102
5024 2970 89.00 413.00 87
2990 8310 71.00 283.00 71
4384 71.00 282.00 68
2970 62.00 240.00 58
5584 8310 56.00 208.00 54
如何times_sold在1_product_id & 2_product_id不丟失所需結構的情況下對其進行排序和分組?我檢查了這個答案,但它沒有幫助我。
編輯
我試過:
df_out.groupby(['1_product_id','2_product_id']).sum() \
.sort_values('times_sold', ascending = False) \
.sort_index(0, ascending = False)
但是索引不是我想要的順序。
1_qty_sold 2_qty_sold times_sold
1_product_id 2_product_id
7889 2970 104.00 497.00 102 # this should be the 2nd index
5584 4384 159.00 653.00 153 # this should be the 1st index
8310 56.00 208.00 54
5024 2970 89.00 413.00 87 # this should be the 3rd index
2990 4384 71.00 282.00 68 # this should be the 4th index
2970 62.00 240.00 58
8310 71.00 283.00 71
使用@jezrael解決方案后,我得到:
times_sold 1_qty_sold 2_qty_sold
1_product_id 2_product_id
5584 4384 153 159.00 653.00
7889 2970 102 104.00 497.00
5024 2970 87 89.00 413.00
2990 8310 71 71.00 283.00
4384 68 71.00 282.00
2970 58 62.00 240.00
5584 8310 54 56.00 208.00
雖然我試圖讓它看起來像這樣:
# note that the last row is now the second row and indexes `1_product_id` is unique.
times_sold 1_qty_sold 2_qty_sold
1_product_id 2_product_id
5584 4384 153 159.00 653.00
8310 54 56.00 208.00
7889 2970 102 104.00 497.00
5024 2970 87 89.00 413.00
2990 8310 71 71.00 283.00
4384 68 71.00 282.00
2970 58 62.00 240.00
uj5u.com熱心網友回復:
為了防止分揀groupby加sort=False用字典為原來的順序進行映射:
print (df_out)
1_product_id 2_product_id 1_qty_sold 2_qty_sold times_sold
0 5584 4384 159.0 653.0 10 <- changed for test sort
1 7889 2970 104.0 497.0 102
2 5024 2970 89.0 413.0 87
3 2990 8310 71.0 283.0 71
4 2990 4384 71.0 282.0 68
5 2990 2970 62.0 240.0 58
6 5584 8310 56.0 208.0 54
uniq = {v: k for k, v in dict(enumerate(df_out['1_product_id'].unique())).items()}
print(uniq)
{5584: 0, 7889: 1, 5024: 2, 2990: 3}
df = (df_out.groupby(['1_product_id','2_product_id'], sort=False)
.sum()
.sort_values(['1_product_id', 'times_sold'],
key=lambda x: x.map(uniq).fillna(x),
ascending=[True, False])
)
print (df)
1_qty_sold 2_qty_sold times_sold
1_product_id 2_product_id
5584 8310 56.0 208.0 54
4384 159.0 653.0 10
7889 2970 104.0 497.0 102
5024 2970 89.0 413.0 87
2990 8310 71.0 283.0 71
4384 71.0 282.0 68
2970 62.0 240.0 58
與另一個替代groupby:
df = (df_out.groupby(['1_product_id','2_product_id'], sort=False)
.sum()
.groupby('1_product_id', sort=False, group_keys=False)
.apply(lambda x: x.sort_values('times_sold', ascending=False))
)
print (df)
1_qty_sold 2_qty_sold times_sold
1_product_id 2_product_id
5584 8310 56.0 208.0 54
4384 159.0 653.0 10
7889 2970 104.0 497.0 102
5024 2970 89.0 413.0 87
2990 8310 71.0 283.0 71
4384 71.0 282.0 68
2970 62.0 240.0 58
uj5u.com熱心網友回復:
完成處理后,您可以添加
df_out.sort_index(0)
這將解決您的第一個索引排序問題。其他的將已經根據售出時間進行排序。
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/366370.html
