我試圖通過排除焦點公司來計算其他值的平均值。我知道這有點復雜,但讓我解釋一下:
例如,假設以下代碼是我的資料:
d = {'col1': ["A", "A", "A", "B", "B", "B", "c", "c","c", "d", "d", "d", "e", "e", "e"],
'col2': [2015, 2016, 2017, 2015, 2016, 2017, 2015, 2016, 2017, 2015, 2016, 2017, 2015, 2016, 2017],
'col3': [10, 20, 25, 10, 12, 14, 8, 9, 10, 50, 60, 70, 40, 50, 60],
'group':[10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 20, 20, 20, 20,20]}
df = pd.DataFrame(d)
通過考慮 df.group,我想獲得 (B C) 的 2015 年平均值并將其添加到 A.2016 的新列中。因此,我們需要通過排除焦點專案來獲取上一年的 df.group 平均值。
結果應與此相對應:
d = {'col1': ["A", "A", "A", "B", "B", "B", "c", "c", "c", "d", "d", "d", "e", "e", "e"],
'col2': [2015, 2016, 2017, 2015, 2016, 2017, 2015, 2016, 2017, 2015, 2016, 2017, 2015, 2016, 2017],
'col3': [10, 20, 25, 10, 12, 14, 8, 9, 10, 50, 60, 70, 40, 50, 60],
'group':[10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 20, 20, 20, 20,20],
'operation':['0', '(B2015 C2015)/2', '(B2016 C2016)/2', '0', '(A2015 C2015)/2', '(A2016 C2016)/2', '0', '(A2015 B2015)/2', '(A2016 B2016)/2',"0", "E2015", "E2016", "0","D2015", "D2016" ],
'mean': [nan, 9, 10.5, nan, 9, 14.5, nan, 10, 16, nan, 40, 50, nan, 50, 60]}
output = pd.DataFrame(d)
>>> output
col1 col2 col3 group operation mean
0 A 2015 10 10 nan 0.0
1 A 2016 20 10 (B2015 C2015)/2 9.0
2 A 2017 25 10 (B2016 C2016)/2 10.5
3 B 2015 10 10 0 0.0
4 B 2016 12 10 (A2015 C2015)/2 9.0
5 B 2017 14 10 (A2016 C2016)/2 14.5
6 c 2015 8 10 0 0.0
7 c 2016 9 10 (A2015 B2015)/2 10.0
8 c 2017 10 10 (A2016 B2016)/2 16.0
9 d 2015 50 20 0 0.0
10 d 2016 60 20 E2015 40.0
11 d 2017 70 20 E2016 50.0
12 e 2015 40 20 0 0.0
13 e 2016 50 20 D2015 50.0
14 e 2017 60 20 D2016 60.0
uj5u.com熱心網友回復:
- 使用 double 計算每組內所有其他值的平均值
groupby:
sum組內的所有值- 減去當前(焦點)值
- 除以組中的專案數減一
- 將
shift-ed 方法分配給新列:
means = df.groupby("group").apply(lambda x: x.groupby("col2")["col3"].transform("sum").sub(x["col3"]).div(len(x["col1"].unique())-1)).droplevel(0)
df["mean"] = means.shift().where(df["col1"].eq(df["col1"].shift()),0)
>>> df
col1 col2 col3 group mean
0 A 2015 10 10 0.0
1 A 2016 20 10 9.0
2 A 2017 25 10 10.5
3 B 2015 10 10 0.0
4 B 2016 12 10 9.0
5 B 2017 14 10 14.5
6 c 2015 8 10 0.0
7 c 2016 9 10 10.0
8 c 2017 10 10 16.0
9 d 2015 50 20 0.0
10 d 2016 60 20 40.0
11 d 2017 70 20 50.0
12 e 2015 40 20 0.0
13 e 2016 50 20 50.0
14 e 2017 60 20 60.0
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/467755.html
標籤:Python 数据库 数据框 拉姆达 熊猫-groupby
