可以說我有一個這樣的資料框:
day uid orders
0 2022-03-15 1 20
1 2022-03-15 2 10
2 2022-03-15 3 50
3 2022-03-15 4 1
4 2022-03-16 1 20
5 2022-03-16 2 10
6 2022-03-16 3 50
7 2022-03-16 4 1
8 2022-03-17 1 20
9 2022-03-17 2 10
10 2022-03-17 3 50
11 2022-03-17 4 1
12 2022-03-18 1 20
13 2022-03-18 2 10
14 2022-03-18 3 50
15 2022-03-18 4 1
如何獲得一個資料框來查找整個資料框中每個用戶 ID 完成的購買。就像是
orders users_ordered % of total
0 1 4 25%
1 20 4 25%
2 50 4 25%
這意味著,一整天,4 個用戶有 1 個訂單,4 個用戶有 20 個訂單,4 個用戶有 50 個訂單。
我認為占總數的百分比可以通過
df['% of total'] = 100 * df['orders'] / df.groupby('customer__id')['orders'].transform('sum')
如果我能得到如何獲得我的目標資料框。
未考慮:重復訂單值
orders users_ordered % of total
0 1 10 3%
1 2 10 3%
2 27 10 3%
3 26 10 3%
4 25 10 3%
5 24 10 3%
6 23 10 3%
7 22 10 3%
8 21 10 3%
9 20 10 3%
10 19 10 3%
11 18 10 3%
12 17 10 3%
13 16 10 3%
14 15 10 3%
15 14 10 3%
16 13 10 3%
17 12 10 3%
18 11 10 3%
19 10 10 3%
20 9 10 3%
21 8 10 3%
22 7 10 3%
23 6 10 3%
24 5 10 3%
25 4 10 3%
26 3 10 3%
27 28 10 3%
uj5u.com熱心網友回復:
使用Series.value_counts:
s = df['orders'].value_counts().rename('users_ordered')
new_df = \
pd.concat((s,
s.div(s.sum()).mul(100).astype(int)
.astype(str).add('%').rename('% of total')), axis=1)\
.rename_axis(index='orders')\
.reset_index()
print(new_df)
orders users_ordered % of total
0 20 4 25%
1 10 4 25%
2 50 4 25%
3 1 4 25%
要么:
new_df = \
pd.concat((df['orders'].value_counts().rename('users_ordered'),
df['orders'].value_counts(normalize=True).mul(100)
.astype(int)
.astype(str).add('%').rename('% of total')), axis=1)\
.rename_axis(index='orders')\
.reset_index()
uj5u.com熱心網友回復:
這有效:
new_df = df.groupby('orders')['uid'].count().reset_index(name='users_ordered')
new_df['% of total'] = new_df['users_ordered'].div(new_df['users_ordered'].sum()).mul(100).astype(int).astype(str).add('%')
輸出:
>>> new_df
orders users_ordered % of total
0 1 4 25%
1 10 4 25%
2 20 4 25%
3 50 4 25%
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/444541.html
標籤:熊猫
下一篇:替換熊貓中的反引號
