計算客戶購買的次數-有解無憂

可以說我有一個這樣的資料框：

           day  uid  orders     
0   2022-03-15  1      20
1   2022-03-15  2      10
2   2022-03-15  3      50 
3   2022-03-15  4       1 
4   2022-03-16  1      20  
5   2022-03-16  2      10
6   2022-03-16  3      50 
7   2022-03-16  4       1  
8   2022-03-17  1      20 
9   2022-03-17  2      10 
10  2022-03-17  3      50 
11  2022-03-17  4       1  
12  2022-03-18  1      20 
13  2022-03-18  2      10
14  2022-03-18  3      50 
15  2022-03-18  4       1

如何獲得一個資料框來查找整個資料框中每個用戶 ID 完成的購買。就像是

 orders  users_ordered  % of total     
0   1     4              25%
1   20    4              25%
2   50     4              25%

這意味著，一整天，4 個用戶有 1 個訂單，4 個用戶有 20 個訂單，4 個用戶有 50 個訂單。

我認為占總數的百分比可以通過

 df['% of total'] = 100 * df['orders'] / df.groupby('customer__id')['orders'].transform('sum')

如果我能得到如何獲得我的目標資料框。

未考慮：重復訂單值

    orders  users_ordered % of total
0        1             10         3%
1        2             10         3%
2       27             10         3%
3       26             10         3%
4       25             10         3%
5       24             10         3%
6       23             10         3%
7       22             10         3%
8       21             10         3%
9       20             10         3%
10      19             10         3%
11      18             10         3%
12      17             10         3%
13      16             10         3%
14      15             10         3%
15      14             10         3%
16      13             10         3%
17      12             10         3%
18      11             10         3%
19      10             10         3%
20       9             10         3%
21       8             10         3%
22       7             10         3%
23       6             10         3%
24       5             10         3%
25       4             10         3%
26       3             10         3%
27      28             10         3%

uj5u.com熱心網友回復：

使用Series.value_counts：

s = df['orders'].value_counts().rename('users_ordered')
new_df = \
    pd.concat((s,
               s.div(s.sum()).mul(100).astype(int)
                .astype(str).add('%').rename('% of total')), axis=1)\
      .rename_axis(index='orders')\
      .reset_index()
print(new_df)


   orders  users_ordered % of total
0      20              4        25%
1      10              4        25%
2      50              4        25%
3       1              4        25%

要么：

new_df = \
    pd.concat((df['orders'].value_counts().rename('users_ordered'),
               df['orders'].value_counts(normalize=True).mul(100)
                           .astype(int)
                           .astype(str).add('%').rename('% of total')), axis=1)\
      .rename_axis(index='orders')\
      .reset_index()

uj5u.com熱心網友回復：

這有效：

new_df = df.groupby('orders')['uid'].count().reset_index(name='users_ordered')
new_df['% of total'] = new_df['users_ordered'].div(new_df['users_ordered'].sum()).mul(100).astype(int).astype(str).add('%')

輸出：

>>> new_df
   orders  users_ordered % of total
0       1              4        25%
1      10              4        25%
2      20              4        25%
3      50              4        25%

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/444541.html

標籤：熊猫

上一篇：每個月底回圈并添加每個月銷售列Pandas的列

下一篇：替換熊貓中的反引號