pandasgroupby將字串值與前一行值進行比較并發現新列中的變化-有解無憂

我有這個演示df：

info = {'customer': ['Jason', 'Jason', 'Jason', 'Jason',
                     'Molly', 'Molly', 'Molly', 'Molly'], 
'Good': ['Cookie', 'Cookie', 'Cookie', 'Cookie','Ice Cream', 
         'Ice Cream', 'Ice Cream', 'Ice Cream'],
'Date' :['2021-12-14','2022-01-04','2022-01-11','2022-01-18',
         '2022-01-12','2022-01-15','2022-01-19','2022-01-30'],
'Flavor' :['Chocolate','Vanilla','Vanilla','Strawberry',
           'Chocolate', 'Vanilla', 'Caramel', 'Caramel']}
df = pd.DataFrame(data=info)
df

給出：

   customer   Good      Date        Flavor
0   Jason   Cookie      2021-12-14  Chocolate
1   Jason   Cookie      2022-01-04  Vanilla
2   Jason   Cookie      2022-01-11  Vanilla
3   Jason   Cookie      2022-01-18  Strawberry
4   Molly   Ice Cream   2022-01-12  Chocolate
5   Molly   Ice Cream   2022-01-15  Vanilla
6   Molly   Ice Cream   2022-01-19  Caramel
7   Molly   Ice Cream   2022-01-30  Caramel

我正在嘗試在新列中跟蹤每個客戶每個商品的風味變化From- To。我做了分組部分：

   df.sort_values(['Date']).groupby(['customer','Good','Date'])['Flavor'].sum()

我有：

 customer  Good       Date      
    Jason     Cookie     2021-12-14     Chocolate
                         2022-01-04       Vanilla
                         2022-01-11       Vanilla
                         2022-01-18    Strawberry
    Molly     Ice Cream  2022-01-12     Chocolate
                         2022-01-15       Vanilla
                         2022-01-19       Caramel
                         2022-01-30       Caramel
    Name: Flavor, dtype: object

每組的第一行是入口點，然后我想比較每組的下一個變化，如果它不同，那么我們跟蹤新列（從和到）中的變化，如果相似的值沒有任何反應。

我嘗試了多種方法和代碼，但不幸的是我不知道最好的方法。

考慮到的預期輸出reset_index()：

  customer   Good      Date        Flavor           From         To
0   Jason   Cookie      2021-12-14  Chocolate    
1   Jason   Cookie      2022-01-04  Vanilla         Chocolate    Vanilla
2   Jason   Cookie      2022-01-11  Vanilla
3   Jason   Cookie      2022-01-18  Strawberry      Vanilla      Strawberry
4   Molly   Ice Cream   2022-01-12  Chocolate
5   Molly   Ice Cream   2022-01-15  Vanilla         Chocolate    Vanilla
6   Molly   Ice Cream   2022-01-19  Caramel         Vanilla      Caramel
7   Molly   Ice Cream   2022-01-30  Caramel

uj5u.com熱心網友回復：

s=df.assign(
             
             
             From = df.sort_values(by='Date').groupby(['customer',  'Good'])['Flavor'].apply(lambda x: x.shift(1)),
             To = df['Flavor']
).dropna()

out = df.join(s[s['From'] != s['To']].iloc[:,-2:]).fillna('')




   customer       Good        Date      Flavor       From          To
0    Jason     Cookie  2021-12-14   Chocolate                       
1    Jason     Cookie  2022-01-04     Vanilla  Chocolate     Vanilla
2    Jason     Cookie  2022-01-11     Vanilla                       
3    Jason     Cookie  2022-01-18  Strawberry    Vanilla  Strawberry
4    Molly  Ice Cream  2022-01-12   Chocolate                       
5    Molly  Ice Cream  2022-01-15     Vanilla  Chocolate     Vanilla
6    Molly  Ice Cream  2022-01-19     Caramel    Vanilla     Caramel
7    Molly  Ice Cream  2022-01-30     Caramel

uj5u.com熱心網友回復：

在sum您創建的（名為g）的基礎上，我們可以groupby將索引的前 2 級和shift它，然后join回傳到g. 在rename-ing 列之后，mask“To”和“From”列取決于是否有任何更改或者它是否為 NaN。最后，join回到 DataFrame：

g = df.sort_values(['Date']).groupby(['customer','Good','Date'])['Flavor'].sum()
joined = g.to_frame().assign(To=g).join(g.groupby(level=[0,1]).shift().to_frame(), lsuffix='', rsuffix='_').rename(columns={'Flavor_':'From'})
joined.update(joined[['To','From']].mask(joined['From'].isna() | joined['From'].eq(joined['To']), ''))
out = joined[['Flavor','From','To']].reset_index()

輸出：

  customer       Good        Date      Flavor       From          To
0    Jason     Cookie  2021-12-14   Chocolate                       
1    Jason     Cookie  2022-01-04     Vanilla  Chocolate     Vanilla
2    Jason     Cookie  2022-01-11     Vanilla                       
3    Jason     Cookie  2022-01-18  Strawberry    Vanilla  Strawberry
4    Molly  Ice Cream  2022-01-12   Chocolate                       
5    Molly  Ice Cream  2022-01-15     Vanilla  Chocolate     Vanilla
6    Molly  Ice Cream  2022-01-19     Caramel    Vanilla     Caramel
7    Molly  Ice Cream  2022-01-30     Caramel

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/424145.html

標籤：Python 熊猫数据框麻木的熊猫-groupby

上一篇：按列對資料框中的值進行排序，只有在相等時才取第二個

下一篇：獲取3D空間中一個點的26個最近鄰-向量化