我有這個演示df:
info = {'customer': ['Jason', 'Jason', 'Jason', 'Jason',
'Molly', 'Molly', 'Molly', 'Molly'],
'Good': ['Cookie', 'Cookie', 'Cookie', 'Cookie','Ice Cream',
'Ice Cream', 'Ice Cream', 'Ice Cream'],
'Date' :['2021-12-14','2022-01-04','2022-01-11','2022-01-18',
'2022-01-12','2022-01-15','2022-01-19','2022-01-30'],
'Flavor' :['Chocolate','Vanilla','Vanilla','Strawberry',
'Chocolate', 'Vanilla', 'Caramel', 'Caramel']}
df = pd.DataFrame(data=info)
df
給出:
customer Good Date Flavor
0 Jason Cookie 2021-12-14 Chocolate
1 Jason Cookie 2022-01-04 Vanilla
2 Jason Cookie 2022-01-11 Vanilla
3 Jason Cookie 2022-01-18 Strawberry
4 Molly Ice Cream 2022-01-12 Chocolate
5 Molly Ice Cream 2022-01-15 Vanilla
6 Molly Ice Cream 2022-01-19 Caramel
7 Molly Ice Cream 2022-01-30 Caramel
我正在嘗試在新列中跟蹤每個客戶每個商品的風味變化From- To。我做了分組部分:
df.sort_values(['Date']).groupby(['customer','Good','Date'])['Flavor'].sum()
我有:
customer Good Date
Jason Cookie 2021-12-14 Chocolate
2022-01-04 Vanilla
2022-01-11 Vanilla
2022-01-18 Strawberry
Molly Ice Cream 2022-01-12 Chocolate
2022-01-15 Vanilla
2022-01-19 Caramel
2022-01-30 Caramel
Name: Flavor, dtype: object
每組的第一行是入口點,然后我想比較每組的下一個變化,如果它不同,那么我們跟蹤新列(從和到)中的變化,如果相似的值沒有任何反應。
我嘗試了多種方法和代碼,但不幸的是我不知道最好的方法。
考慮到的預期輸出reset_index():
customer Good Date Flavor From To
0 Jason Cookie 2021-12-14 Chocolate
1 Jason Cookie 2022-01-04 Vanilla Chocolate Vanilla
2 Jason Cookie 2022-01-11 Vanilla
3 Jason Cookie 2022-01-18 Strawberry Vanilla Strawberry
4 Molly Ice Cream 2022-01-12 Chocolate
5 Molly Ice Cream 2022-01-15 Vanilla Chocolate Vanilla
6 Molly Ice Cream 2022-01-19 Caramel Vanilla Caramel
7 Molly Ice Cream 2022-01-30 Caramel
uj5u.com熱心網友回復:
s=df.assign(
From = df.sort_values(by='Date').groupby(['customer', 'Good'])['Flavor'].apply(lambda x: x.shift(1)),
To = df['Flavor']
).dropna()
out = df.join(s[s['From'] != s['To']].iloc[:,-2:]).fillna('')
customer Good Date Flavor From To
0 Jason Cookie 2021-12-14 Chocolate
1 Jason Cookie 2022-01-04 Vanilla Chocolate Vanilla
2 Jason Cookie 2022-01-11 Vanilla
3 Jason Cookie 2022-01-18 Strawberry Vanilla Strawberry
4 Molly Ice Cream 2022-01-12 Chocolate
5 Molly Ice Cream 2022-01-15 Vanilla Chocolate Vanilla
6 Molly Ice Cream 2022-01-19 Caramel Vanilla Caramel
7 Molly Ice Cream 2022-01-30 Caramel
uj5u.com熱心網友回復:
在sum您創建的(名為g)的基礎上,我們可以groupby將索引的前 2 級和shift它,然后join回傳到g. 在rename-ing 列之后,mask“To”和“From”列取決于是否有任何更改或者它是否為 NaN。最后,join回到 DataFrame:
g = df.sort_values(['Date']).groupby(['customer','Good','Date'])['Flavor'].sum()
joined = g.to_frame().assign(To=g).join(g.groupby(level=[0,1]).shift().to_frame(), lsuffix='', rsuffix='_').rename(columns={'Flavor_':'From'})
joined.update(joined[['To','From']].mask(joined['From'].isna() | joined['From'].eq(joined['To']), ''))
out = joined[['Flavor','From','To']].reset_index()
輸出:
customer Good Date Flavor From To
0 Jason Cookie 2021-12-14 Chocolate
1 Jason Cookie 2022-01-04 Vanilla Chocolate Vanilla
2 Jason Cookie 2022-01-11 Vanilla
3 Jason Cookie 2022-01-18 Strawberry Vanilla Strawberry
4 Molly Ice Cream 2022-01-12 Chocolate
5 Molly Ice Cream 2022-01-15 Vanilla Chocolate Vanilla
6 Molly Ice Cream 2022-01-19 Caramel Vanilla Caramel
7 Molly Ice Cream 2022-01-30 Caramel
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/424145.html
標籤:Python 熊猫 数据框 麻木的 熊猫-groupby
