我有一個像下面這樣的資料框。
import pandas as pd
data = {'Date': ['2022/09/01', '2022/09/02', '2022/09/03', '2022/09/04', '2022/09/05','2022/09/01', '2022/09/02', '2022/09/03', '2022/09/04', '2022/09/05','2022/09/01', '2022/09/02', '2022/09/03', '2022/09/04', '2022/09/05'],
'Runner': ['Runner A', 'Runner A', 'Runner A', 'Runner A', 'Runner A','Runner B', 'Runner B', 'Runner B', 'Runner B', 'Runner B','Runner C', 'Runner C', 'Runner C', 'Runner C', 'Runner C'],
'Training Time': ['less than 1 hour', 'less than 1 hour', 'less than 1 hour', 'less than 1 hour', '1 hour to 2 hour','less than 1 hour', '1 hour to 2 hour', 'less than 1 hour', '1 hour to 2 hour', '2 hour to 3 hour', '1 hour to 2 hour ', '2 hour to 3 hour' ,'1 hour to 2 hour ', '2 hour to 3 hour', '2 hour to 3 hour']
}
df = pd.DataFrame(data)
我已經使用下面的代碼計算了每個跑步者的發生次數
s = df.groupby(['Runner','Training Time']).size()
s
結果如下所示。

當我使用下面的代碼來獲得最大出現次數時。
df = s.loc[s.groupby(level=0).idxmax()].reset_index().drop(0,axis=1)
df
結果

問題出在 Runner B 上。它應該顯示“1 小時到 2 小時”和“不到 1 小時”。但現在它只顯示“1小時到2小時”
我該如何解決這個問題?謝謝。預期結果

uj5u.com熱心網友回復:
import pandas as pd
data = {'Date': ['2022/09/01', '2022/09/02', '2022/09/03', '2022/09/04', '2022/09/05','2022/09/01', '2022/09/02', '2022/09/03', '2022/09/04', '2022/09/05','2022/09/01', '2022/09/02', '2022/09/03', '2022/09/04', '2022/09/05'],
'Runner': ['Runner A', 'Runner A', 'Runner A', 'Runner A', 'Runner A','Runner B', 'Runner B', 'Runner B', 'Runner B', 'Runner B','Runner C', 'Runner C', 'Runner C', 'Runner C', 'Runner C'],
'Training Time': ['less than 1 hour', 'less than 1 hour', 'less than 1 hour', 'less than 1 hour', '1 hour to 2 hour','less than 1 hour', '1 hour to 2 hour', 'less than 1 hour', '1 hour to 2 hour', '2 hour to 3 hour', '1 hour to 2 hour ', '2 hour to 3 hour' ,'1 hour to 2 hour ', '2 hour to 3 hour', '2 hour to 3 hour']
}
df = pd.DataFrame(data)
s = df.groupby(['Runner', 'Training Time'], as_index=False).size()
s.columns = ['Runner', 'Training Time', 'Size']
r = s.groupby(['Runner'], as_index=False)['Size'].max()
df_list = []
for index, row in r.iterrows():
temp_df = s[(s['Runner'] == row['Runner']) & (s['Size'] == row['Size'])]
df_list.append(temp_df)
df_report = pd.concat(df_list)
print(df_report)
df_report.to_csv('report.csv', index = False)
uj5u.com熱心網友回復:
這是我能想到的最好的……有點惡心
def agg_most_common(vals):
print("vals")
matches = []
for i in collections.Counter(vals).most_common():
if not matches or matches[0][1] == i[1]:
matches.append(i)
else:
break
return [x[0] for x in matches]
print(df.groupby('Runner')['Training Time'].agg(agg_most_common))
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/516780.html
上一篇:查找每個特定視窗中的最大行數
