我有一個看起來像這樣的資料框(這只是一個小樣本):
student school team answers question
a scl first True x
a scl first False y
a scl first True y
b scl first False x
c scl sec False y
c scl sec True z
d scl sec True x
d scl sec True z
e scl third True z
e scl third False z
我想做一個看起來像這樣的排名:
df_overall=
question first sec third
0 x 0.5 1.0 NaN
1 y 0.5 0.0 NaN
2 z NaN 1.0 0.5
所以我寫道:
df_overall = df.groupby(['team', 'question'])['answers'].apply(lambda x: x.sum()/len(x)).reset_index()
df_overall = df_overall.sort_values(by=['question']).rename(columns={'answers': 'TeamRanking'})
df_overall = df_overall.pivot_table(index='question', columns='team', values='TeamRanking').reset_index().rename_axis(None, axis=1)
但它KeyError: 'team'在最后一行給了我一個。如果我只運行前兩行,它就可以作業。我試著加['team']括號,我檢查了列print (df.columns.tolist()),它們都在那里,沒有空格,沒有奇怪的書寫。的dtypes都是物件,除了answers這是布爾。我真的不明白為什么它找不到它
uj5u.com熱心網友回復:
透視和洗掉多索引列,代碼如下
newdf =pd.pivot_table(df,index='question', columns=['team'], aggfunc=np.mean).droplevel(0, axis=1).reset_index()
team question first sec third
0 x 0.5 1.0 NaN
1 y 0.5 0.0 NaN
2 z NaN 1.0 0.5
uj5u.com熱心網友回復:
使用pivot_table:
>>> df.pivot_table(index='question', columns='team', values='answers')
team first sec third
question
x 0.5 1.0 NaN
y 0.5 0.0 NaN
z NaN 1.0 0.5
我還考慮了這樣一種情況,我只考慮了學生的第一個答案,而忽略了他們多次回答相同問題的事實
>>> df.drop_duplicates(['student', 'team', 'question']) \
.pivot_table(index='question', columns='team', values='answers')
team first sec third
question
x 0.5 1.0 NaN
y 0.0 0.0 NaN
z NaN 1.0 1.0
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/325893.html
上一篇:如何按元素比較兩個資料框?
