這是一個非常奇怪的資料集,我不知道如何預處理如下示例:
Year, ID, feature1, feature2, target1
2008, 1, 10, 20, 5
2008, 1, 12, 25, 6
2008, 1, NaN, NaN, 4
2008, 1, NaN, NaN, 7
2008, 1, NaN, NaN, 3
2008, 1, NaN, NaN, 5
2008, 2, 22, 16, 7
2008, 2, 24, 14, 3
2008, 2, NaN, NaN, 5
2008, 2, NaN, NaN, 6
2008, 2, NaN, NaN, 9
2008, 3, 12, 15, 6
2008, 3, NaN, NaN, 1
....
問題是我想用平均值替換前兩個條目,并且還用列feature1和的前兩個值的平均值填充 NaN feature2。如果只有一列有類似 的條目ID == 3,我將填寫。
示例輸出:
Year, ID, feature1, feature2, target1
2008, 1, 11, 22.5, 5
2008, 1, 11, 22.5, 6
2008, 1, 11, 22.5, 4
2008, 1, 11, 22.5, 7
2008, 1, 11, 22.5, 3
2008, 1, 11, 22.5, 5
2008, 2, 23, 15, 7
2008, 2, 23, 15, 3
2008, 2, 23, 15, 5
2008, 2, 23, 15, 6
2008, 2, 23, 15, 9
2008, 3, 12, 15, 6
2008, 3, 12, 15, 1
....
有沒有辦法做到這一點?
uj5u.com熱心網友回復:
試試 transform mean
g = df.groupby(['Year','ID'])
df['feature1'] = g['feature1'].transform('mean')
df['feature2'] = g['feature2'].transform('mean')
uj5u.com熱心網友回復:
使用groupby_transform更新的價值觀feature1和feature2列:
df.update(df.groupby(['Year', 'ID'])['feature1', 'feature2'].transform('mean'))
print(df)
# Output:
Year ID feature1 feature2 target1
0 2008 1 11.0 22.5 5
1 2008 1 11.0 22.5 6
2 2008 1 11.0 22.5 4
3 2008 1 11.0 22.5 7
4 2008 1 11.0 22.5 3
5 2008 1 11.0 22.5 5
6 2008 2 23.0 15.0 7
7 2008 2 23.0 15.0 3
8 2008 2 23.0 15.0 5
9 2008 2 23.0 15.0 6
10 2008 2 23.0 15.0 9
11 2008 3 12.0 15.0 6
12 2008 3 12.0 15.0 1
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/353674.html
