以下資料框中的“年齡”功能已損壞,因為給定固定的 User_ID,所有“日期”的年齡都相同。我想從原始年齡中減去最后一次發生日期和日期之間的年數差。
import pandas as pd
df = pd.DataFrame({
"User_ID": [ "N1", "N2", "N3", "N1", "N1", "N2", "N3", "N2" , "N1", "N1", "N1", "N2"],
"Date": [ "31/10/2021", "31/10/2020" , "31/10/2019", "24/10/2019", "22/10/2018", "15/10/2017", "14/10/2017", "13/10/2016", "12/10/2016", "11/10/2015", "2/10/2015", "1/10/2015" ],
"Age": [6,5,8,6,6,5,8,5,6,6,6,5]
})
因此對于資料框
ID Date Age
0 N1 2021-10-31 6
1 N2 2020-10-31 5
2 N3 2019-10-31 8
3 N1 2019-10-24 6
4 N1 2018-10-28 6
5 N2 2017-10-15 5
6 N3 2017-10-14 8
7 N2 2016-10-13 5
8 N1 2016-10-12 6
9 N1 2015-10-11 6
10 N1 2015-10-2 6
11 N2 2015-10-1 5
結果應該是這樣的
ID Date Age
0 N1 2021-10-31 6
1 N2 2020-10-31 5
2 N3 2019-10-31 8
3 N1 2019-10-24 4
4 N1 2018-10-28 3
5 N2 2017-10-15 2
6 N3 2017-10-14 6
7 N2 2016-10-13 1
8 N1 2016-10-12 1
9 N1 2015-10-11 0
10 N1 2015-10-2 0
11 N2 2015-10-1 0
有沒有什么快速的方法可以做到這一點?
uj5u.com熱心網友回復:
您可以創建Series多年,獲得由第一differenciesyear在GroupBy.first與GroupBy.transform原裝y和列用于減Age:
y = df['Date'].dt.year
df['Age'] = df['Age'].sub(y.groupby(df['User_ID']).transform('first').sub(y))
print (df)
User_ID Date Age
0 N1 2021-10-31 6
1 N2 2020-10-31 5
2 N3 2019-10-31 8
3 N1 2019-10-24 4
4 N1 2018-10-22 3
5 N2 2017-10-15 2
6 N3 2017-10-14 6
7 N2 2016-10-13 1
8 N1 2016-12-10 1
9 N1 2015-11-10 0
10 N1 2015-02-10 0
11 N2 2015-01-10 0
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/357480.html
上一篇:與熊貓外部合并時的重復問題
下一篇:從資料框列和值創建嵌套字典
