我有這個例子 df:
data = pd.DataFrame({'id':[1, 2 , 3],
'question': ['first country visited?', 'first city visited?' , 'two cities we love?'],
'answer1': ['UK', 'Paris', 'CA'],
'answer2': ['US', 'New York', 'Paris'],
'answer3': ['CA', 'London', 'London'],
'answer4': ['JP', 'Toronto', 'Los Angeles'],
'correct': [['UK'], ['London'], ['London','Paris']]
})
給出:
id question answer1 answer2 answer3 answer4 correct
0 1 first country visited? UK US CA JP [UK]
1 2 first city visited? Paris New York London Toronto [London]
2 3 two cities we love? CA Paris London Los Angeles [London, Paris]
如果在data['correct']名為data['correct_column']
這是我到目前為止所做的:
data['correct_column'] = data.loc[:,'answer1':'answer4'].isin(data['correct']).idxmax(1)
我把所有相同的結果僅僅是值answer1的data['correct_column'],我不知道為什么
所需的輸出:
id question answer1 answer2 answer3 answer4 correct correct_column
0 1 first country visited? UK US CA JP [UK] answer1
1 2 first city visited? Paris New York London Toronto [London] answer3
2 3 two cities we love? CA Paris London Los Angeles [London, Paris] answer3,answer2
uj5u.com熱心網友回復:
我看到了幾種實作此任務的方法:
使用apply:
cols = data.filter(like='answer').columns
data['correct_column'] = data[cols].apply(lambda s: ','.join((m:=s.isin(data.loc[s.name, 'correct']))[m].index), axis=1)
使用更復雜的方法exploding,檢查身份并按組再次合并:
cols = data.filter(like='answer').columns
df2 = data.explode('correct')
mask = (df2[cols].filter(like='answer').eq(df2['correct'].values, axis=0)
.groupby(level=0).any()
)
data.join(mask.mul(cols).where(mask).apply(lambda x: x.str.cat(sep=','), axis=1).rename('correct_column'))
輸出:
id question answer1 answer2 answer3 answer4 correct correct_column
0 1 first country visited? UK US CA JP [UK] answer1
1 2 first city visited? Paris New York London Toronto [London] answer3
2 3 two cities we love? CA Paris London Los Angeles [London, Paris] answer2,answer3
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/400094.html
