在3級pandasgrupby物件上聚合函式-有解無憂

我想創建一個新的 df，其中包含在下面可見的 df 中的 Value 列上計算的平均值、總和、最小值、最大值等簡單指標，并按 ID、日期和鍵分組。

指數	ID	鑰匙	日期	價值	X	是的	z
0	655	321	2021-01-01	50	546	235	252345
1	675	321	2021-01-01	50	345	345	34545
2	654	356	2021-02-02	70	345	346	543

我這樣做是這樣的：

final = df.groupby(['ID','Date','Key'])['Value'].first().mean(level=[0,1]).reset_index().rename(columns={'Value':'Value_Mean'})

我使用 .first() 因為一個鍵可以在 df 中多次出現，但它們都具有相同的值。我想匯總 ID 和日期，所以我使用 level=[0,1]。

然后我將下一個指標與 pandas 合并添加為：

final = final.merge(df.groupby(['ID','Date','Key'])['Value'].first().max(level=[0,1]).reset_index().rename(columns={'Value':'Value_Max'}), on=['ID','Date'])

我對其他指標也是如此。我想知道是否有比多行重復更復雜的方法。我知道您可以使用 .agg() 并傳遞帶有函式的 dict，但似乎無法指定此處重要的級別。

uj5u.com熱心網友回復：

DataFrame.drop_duplicates與命名聚合一起使用：

df = pd.DataFrame({'ID':[655,655,655,675,654], 'Key':[321,321,333,321,356], 
                  'Date':['2021-01-01','2021-01-01','2021-01-01','2021-01-01','2021-02-02'],
                   'Value':[50,30,10,50,70]})
print (df)
    ID  Key        Date  Value
0  655  321  2021-01-01     50
1  655  321  2021-01-01     30
2  655  333  2021-01-01     10
3  675  321  2021-01-01     50
4  654  356  2021-02-02     70

final = (df.drop_duplicates(['ID','Date','Key'])
           .groupby(['ID','Date'], as_index=False).agg(Value_Mean=('Value','mean'),
                                                       Value_Max=('Value','max')))
print (final)
    ID        Date  Value_Mean  Value_Max
0  654  2021-02-02          70         70
1  655  2021-01-01          30         50
2  675  2021-01-01          50         50

final = (df.groupby(['ID','Date','Key'], as_index=False)
           .first()
           .groupby(['ID','Date'], as_index=False).agg(Value_Mean=('Value','mean'),
                                                       Value_Max=('Value','max')))

print (final)
    ID        Date  Value_Mean  Value_Max
0  654  2021-02-02          70         70
1  655  2021-01-01          30         50
2  675  2021-01-01          50         50

df = (df.groupby(['ID','Date','Key'], as_index=False)
      .first()
        .groupby(['ID','Date'], as_index=False)['Value']
        .agg(['mean', 'max'])
        .add_prefix('Value_')
        .reset_index())
print (df)
    ID        Date  Value_Mean  Value_Max
0  654  2021-02-02          70         70
1  655  2021-01-01          30         50
2  675  2021-01-01          50         50

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/484959.html

標籤：Python 熊猫数据框

上一篇：Pythonpandas從兩列創建新的dict列

下一篇：按多列文本和數字對資料框進行排序，同時忽略大小寫