我有一個帶有名稱列的熊貓資料框,對于每個名稱,我想計算第一個值和其他每個值之間的差異。
資料框
Name First Run Second Run Third Run Time
0 Jakub 315 279 276 9:00
1 Jakub 322 260 270 9:00
2 Jakub 248 260 213 15:00
3 Jakub 246 288 294 15:00
4 Jakub 276 274 263 15:00
5 Bene 222 268 246 9:00
6 Bene 236 302 263 9:00
7 Bene 208 245 224 15:00
8 Bene 283 245 245 15:00
9 Bene 278 257 249 15:00
first_values = df.groupby(['Name']).first()[['First Run','Second Run','Third Run']]
First Run Second Run Third Run Time
Name
Jakub 315 279 276 9:00
Bene 222 268 246 9:00
輸出_df
Name First Run Second Run Third Run Time
0 Jakub 0 0 0 9:00
1 Jakub 7 -19 -6 9:00
2 Jakub -67 -19 -63 15:00
3 Jakub -69 9 18 15:00
4 Jakub -39 -5 -13 15:00
5 Bene 0 0 0 9:00
6 Bene 14 34 17 9:00
7 Bene -14 -23 22 15:00
8 Bene 61 -23 -1 15:00
9 Bene 56 -9 3 15:00
我可以找出正確的函式來獲得所需的輸出。我嘗試了下面的代碼,但我得到了一個 TypeError。
df.Name.map(first_values.index).sub(first_values)
uj5u.com熱心網友回復:
您可以使用groupby apply并減去每組的第一行:
df.filter(like=' Run').groupby(df['Name']).apply(lambda g: g-g.iloc[0])
或從列的手動串列中:
cols = ['First Run','Second Run','Third Run']
df[cols].groupby(df['Name']).apply(lambda g: g-g.iloc[0])
擁有新資料框的完整代碼:
df_output = df.copy()
df_output.update(df.filter(like=' Run')
.groupby(df['Name'])
.apply(lambda g: g-g.iloc[0])
)
輸出:
Name First Run Second Run Third Run Time
0 Jakub 0 0 0 9:00
1 Jakub 7 -19 -6 9:00
2 Jakub -67 -19 -63 15:00
3 Jakub -69 9 18 15:00
4 Jakub -39 -5 -13 15:00
5 Bene 0 0 0 9:00
6 Bene 14 34 17 9:00
7 Bene -14 -23 -22 15:00
8 Bene 61 -23 -1 15:00
9 Bene 56 -11 3 15:00
uj5u.com熱心網友回復:
你可以diff檢查cumsum
df.update(df.groupby('Name')[['First Run','Second Run','Third Run']].apply(lambda x : x.diff().cumsum()).fillna(0))
Out[30]:
FirstRun SecondRun ThirdRun
0 0.0 0.0 0.0
1 7.0 -19.0 -6.0
2 -67.0 -19.0 -63.0
3 -69.0 9.0 18.0
4 -39.0 -5.0 -13.0
5 0.0 0.0 0.0
6 14.0 34.0 17.0
7 -14.0 -23.0 -22.0
8 61.0 -23.0 -1.0
9 56.0 -11.0 3.0
uj5u.com熱心網友回復:
用于GroupBy.transform重復第一個值,因此減去效果很好:
cols = ['First Run','Second Run','Third Run']
df[cols] = df[cols].sub(df.groupby('Name')[cols].transform('first'))
print (df)
Name First Run Second Run Third Run Time
0 Jakub 0 0 0 9:00
1 Jakub 7 -19 -6 9:00
2 Jakub -67 -19 -63 15:00
3 Jakub -69 9 18 15:00
4 Jakub -39 -5 -13 15:00
5 Bene 0 0 0 9:00
6 Bene 14 34 17 9:00
7 Bene -14 -23 -22 15:00
8 Bene 61 -23 -1 15:00
9 Bene 56 -11 3 15:00
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/443626.html
標籤:Python 熊猫 matplotlib
上一篇:基于每月列的大值的子集資料框
