使用熊貓進行多列分組以找到每組的最大值-有解無憂

我有一個如下所示的資料框：

特征	價值	頻率	標簽
45 歲及以上	不	2700	消極的
45 歲及以上	不	1707	積極的
45 歲及以上	不	83	其他
45 歲及以上	是的	222	消極的
45 歲及以上	是的	15	積極的
45 歲及以上	是的	8	其他
45 歲及以上	[空值]	323	消極的
45 歲及以上	[空值]	8	其他
45 歲及以上	[空值]	5	積極的
講話	不	20	消極的
講話	不	170	積極的
講話	不	500	其他
講話	是的	210	消極的
講話	是的	1500	積極的
講話	是的	809	其他
講話	[空值]	234	消極的
講話	[空值]	43	其他
講話	[空值]	85	積極的

等等。

對于每個特征組，我想找到所有相關行資料的最大頻率，比如如果特征是 age_45_and_above那么通過查找NO組，我們有3 行具有不同的頻率和標簽，我想用它報告最大的一個相關資料。

我嘗試groupby過不同的方法：

result.groupby(['Feature', 'Value'])['Frequency', 'Predict'].max()

或者這個，有了這個，我得到了multi-Index dataframe不是想要的結果：

result.groupby(['Feature', 'Value', 'Predict'])['Frequency'].max()

和這么多失敗的嘗試idxmax，transfrom和...。

the intended output I'm looking for looks like this:

Feature	value	frequency	label
age_45_and_above	No	2700	negative
age_45_and_above	Yes	222	negative
age_45_and_above	[Null]	323	negative
talk	No	500	other
talk	Yes	1500	positive
talk	[Null]	234	negative

Also, I wonder how to sum the frequencies for each <<Feature-value>> group except the max row as I don't know how to locate the max row, like in here for the first feature and value, <<age_45_and_above-No>> max is 2700, so the sum would be 1707 83.

Thanks for your time.

uj5u.com熱心網友回復：

在 aidxmax之后使用。groupbyloc

print(df.loc[df.groupby(['Feature','value'])['frequency'].idxmax()])
             Feature   value  frequency     label
0   age_45_and_above      No       2700  negative
3   age_45_and_above     Yes        222  negative
6   age_45_and_above  [Null]        323  negative
11              talk      No        500     other
13              talk     Yes       1500  positive
15              talk  [Null]        234  negative

對于sum沒有max，然后做每組的總和并洗掉行的頻率，然后選擇最大行

gr = df.groupby(['Feature','value'])['frequency']

res = (
    df.assign(total=gr.transform(sum)-df['frequency'])
      .loc[gr.idxmax()]
)
print(res)
             Feature   value  frequency     label  total
0   age_45_and_above      No       2700  negative   1790
3   age_45_and_above     Yes        222  negative     23
6   age_45_and_above  [Null]        323  negative     13
11              talk      No        500     other    190
13              talk     Yes       1500  positive   1019
15              talk  [Null]        234  negative    128

uj5u.com熱心網友回復：

我會通過使用merge分組資料來做到這一點。

基于此資料：

df = pd.DataFrame({'Feature':['age']*9 ['talk']*9,
                   'value':(['No']*3 ['Yes']*3 ['[Null]']*3)*2,
                   'frequency':[2700,1707,83,222,15,8,323,8,5,20,170,500,210,1500,809,234,43,85],
                   'label':['N','P','O']*6})

使用：

df.groupby(['Feature','value'],as_index=False)['frequency'].max().merge(df,on=['Feature','Value','frequency'])

輸出：

  Feature   value  frequency label
0     age      No       2700     N
1     age     Yes        222     N
2     age  [Null]        323     N
3    talk      No        500     O
4    talk     Yes       1500     P
5    talk  [Null]        234     N

添加額外的列可以通過一個簡單的賦值來完成：

df_1['sum_no_max'] = df.groupby(['Feature','value'])['frequency'].sum().values - df_1['frequency'].values

最后輸出：

  Feature   value  frequency label  sum_no_max
0     age      No       2700     N        1790
1     age     Yes        222     N          23
2     age  [Null]        323     N          13
3    talk      No        500     O         190
4    talk     Yes       1500     P        1019
5    talk  [Null]        234     N         128

uj5u.com熱心網友回復：

嘗試這個：

df.groupby(['Feature', 'value'], dropna=False).frequency.max().reset_index()

>>>
      Feature           value        frequency
0     age_45_and_above  No           2700
1     age_45_and_above  Yes          222
2     age_45_and_above  NaN          323
3     talk              No           500
4     talk              Yes          1500
5     talk              NaN          234

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/360295.html

標籤：python pandas dataframe max pandas-groupby

上一篇：插入帶有細節的列

下一篇：從行對構建多索引資料框