我有 3 列的 Pandas DataFrame:
df = pd.DataFrame({'product__sku': [1, 1, 1, 1, 2, 2],
'date': ['2021-10-01 20:48:12 00:00','2021-10-31 20:48:26 00:00',
'2021-09-01 20:48:12 00:00','2021-09-30 20:48:26 00:00',
'2021-10-01 12:23:17 00:00','2021-10-31 12:23:17 00:00'],
'qty': [100, 84, 5, 10, 15, 48]})
看起來像:
|product__sku | date | qty |
|1 | 2021-10-01 20:48:12 00:00 | 100 |
|1 | 2021-10-31 20:48:26 00:00 | 84 |
|1 | 2021-09-01 20:48:12 00:00 | 5 |
|1 | 2021-09-30 20:48:26 00:00 | 10 |
|2 | 2021-10-01 12:23:17 00:00 | 15 |
|2 | 2021-10-31 12:23:17 00:00 | 48 |
我需要按兩列日期(月份)和 product__sku 分組。在 group_by I 列 'qty' 應通過公式 max_date qty - min_date qty 減去(差異)
結果我希望看到
|product__sku | date | diff |
|1 | 2021-09-30 20:48:12 00:00 | 5 |
|1 | 2021-10-31 20:48:12 00:00 | -16 |
|2 | 2021-10-31 20:48:26 00:00 | 33 |
我嘗試使用石斑魚
dg = df.groupby([ pd.Grouper('product__sku'), pd.Grouper(key='date', freq='1M')])['qty'].diff().fillna(0)
但得到了不同的結果:
|0 0.0
| 1 -16.0
| 2 0.0
Name: qty, dtype: float64
uj5u.com熱心網友回復:
第一個 groupbyproduct__sku和month. 然后定義一個自定義函式來查找qty每個組中最大和最小日期之間的差異并將其應用于每個組:
def func(x):
dates = x['date'].sort_values()
diff = x.loc[dates.index[-1], 'qty'] - x.loc[dates.index[0], 'qty']
x = x[x['date']==dates.iloc[-1]]
x['diff'] = diff
return x[['product__sku','date','diff']]
df['date'] = pd.to_datetime(df['date'])
df = df.assign(month=df['date'].dt.month).groupby(['product__sku','month']).apply(func).reset_index(drop=True)
輸出:
product__sku date diff
0 1 2021-09-30 20:48:26 00:00 5
1 1 2021-10-31 20:48:26 00:00 -16
2 2 2021-10-31 12:23:17 00:00 33
uj5u.com熱心網友回復:
在已排序的 DataFrame 中使用GroupBy.aggwithfirst和 last,因此獲取最小和最大日期的值,最后使用DataFrame.pop洗掉列減去值first, last:
如果date每個組需要 last s,也可以為date列使用命名聚合:
df['date'] = pd.to_datetime(df['date'])
dg = (df.sort_values(['product__sku','date'])
.groupby(['product__sku', pd.Grouper(key='date', freq='1M')])
.agg(first=('qty','first'),last=('qty','last'), date=('date', 'first'))
.reset_index(level=-1, drop=True)
.reset_index()
)
dg['diff'] = dg.pop('last').sub(dg.pop('first'))
print (dg)
product__sku date diff
0 1 2021-09-01 20:48:12 00:00 5
1 1 2021-10-01 20:48:12 00:00 -16
2 2 2021-10-01 12:23:17 00:00 33
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/409091.html
標籤:
