我的fillna()方法有問題。這是我的示例 df,它表示商店中的商品數量。我想填充所有NaN。如果有NaN,我想用前一天的值填充它,或者如果它是NaN,然后從第二天開始(相同的產品,相同的商店)。如果特定產品和商店的所有日子都是NaN,那么我想用 0 填充它。我正在尋找最好的熊貓方式來做到這一點,我有一些關于回圈的想法,但它看起來不太好。
我的 df:
day shop product quantity
0 1 shop_A apples 3.0
1 2 shop_A apples NaN
2 3 shop_A apples 1.0
3 1 shop_A bananas NaN
4 2 shop_A bananas NaN
5 3 shop_A bananas NaN
6 1 shop_B apples NaN
7 2 shop_B apples NaN
8 3 shop_B apples 2.0
9 1 shop_B bananas NaN
10 2 shop_B bananas 4.0
11 3 shop_B bananas 2.0
預期 df:
day shop product quantity
0 1 shop_A apples 3.0
1 2 shop_A apples 3.0
2 3 shop_A apples 1.0
3 1 shop_A bananas 0.0
4 2 shop_A bananas 0.0
5 3 shop_A bananas 0.0
6 1 shop_B apples 2.0
7 2 shop_B apples 2.0
8 3 shop_B apples 2.0
9 1 shop_B bananas 4.0
10 2 shop_B bananas 4.0
11 3 shop_B bananas 2.0
我也試過fillna(limit=3),但這不是我要找的。
uj5u.com熱心網友回復:
您可以使用按天排序sort_values,然后執行分組bfill,然后剩下的將通過鏈接 a 獲得 0 fillna(0):
df['quantity'] = df.sort_values(by='day')\
.groupby(['shop','product'])['quantity'].bfill(limit=3).fillna(0)
列印回來:
day shop product quantity
0 1 shop_A apples 3.0
1 2 shop_A apples 1.0
2 3 shop_A apples 1.0
3 1 shop_A bananas 0.0
4 2 shop_A bananas 0.0
5 3 shop_A bananas 0.0
6 1 shop_B apples 2.0
7 2 shop_B apples 2.0
8 3 shop_B apples 2.0
9 1 shop_B bananas 4.0
10 2 shop_B bananas 4.0
11 3 shop_B bananas 2.0
這將為NaN每個商店和產品提供第二天的價值與前一天的價值。您可以類似地使用ffill(或同時使用)和線性插值,并且您的結果會相應地改變。然而,這是您開始所需的東西。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/399309.html
