我正在處理調查資料,我需要根據為調查中的某些問題提供的值創建多個列。假設有兩個提交的調查和每個提交中的兩個問題,我想根據他們的回復制作一列,這些問題的答案應該映射到提交 ID。
所以如果我們有:
submission_id question_id answer
0 1 a Male
1 1 b Cat
2 2 a Female
3 2 b Dog
我想創建列來跟蹤每次提交的性別和最喜歡的動物,我希望有一個類似的表格。
submission_id question_id answer sex favorite_animal
0 1 a Male Male Cat
1 1 b Cat Male Cat
2 2 a Female Female Dog
3 2 b Dog Female Dog
np.where()當只有少數提交可以手動輸入,但資料中有 20 多個提交時,我已經使用了它。前任:
# Get all submission ids
submission_ids = df.submission_id.unique().tolist()
# Get all the animals
pet_list = df[(df['submission_id'] == submission_ids[0]) & (df['question_id'] == 'b') | (df['submission_id'] == submission_ids[1]) & (df['question_id'] == 'b')].loc[:,'answer'].values
# Create new column of animals per submission id
df['favorite_animal'] = np.where(df.submission_id == submission_ids[0], pet_list[0], np.where(df.submission_id == submission_ids[1], pet_list[1], 'fail'))
為了將其擴展到資料中提交的數量,下面的 lambda 函式還在每一行中生成了生成器,我無法提取和理解它是否會產生所需的結果
df['favorite_animal'] = df[['question_id', 'submission_id', 'answer']].apply(lambda x: (x['answer'] if x['question_id'] == 'b' and x['submission_id'] == submission_id else 'fail' for submission_id in submission_ids), axis=1)
uj5u.com熱心網友回復:
做pivot那么merge
out = df.merge(df.pivot(*df).reset_index().rename(columns={'a':'gender','b':'favorite_animal'}))
Out[125]:
submission_id question_id answer gender favorite_animal
0 1 a Male Male Cat
1 1 b Cat Male Cat
2 2 a Female Female Dog
3 2 b Dog Female Dog
uj5u.com熱心網友回復:
我能夠從BENY在他們的回答中為我的用例提供的內容中學習。這里我有 5 個 question_id,需要創建到新列中,并針對每個 submit_id 進行跟蹤。
# Subset out questions to become columns
submission_ids = df.submission_id.unique().tolist()
question_ids = [1, 2, 3, 4, 5]
questions_df = df[df['question_id'].isin(question_ids)]
questions_df = questions_df[['submission_id', 'question_id', 'answer']]
questions_df.reset_index(drop=True, inplace=True)
# Create pivot table of questions
questions_df_pivot_table = questions_df.pivot_table(index='submission_id', columns='question_id', values='answer', aggfunc=lambda x: ''.join(x)).reset_index().rename(columns={1: 'col_1', 2: 'col_2', 3: 'col_3', 4: 'col_4', 5: 'col_5'})
questions_df_pivot_table.columns.name = None
# Merge back into main df
df = df.merge(questions_df_pivot_table)
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/334231.html
