data = pd.DataFrame({'id':[1, 2 , 3],
'question': ['first country visited?', 'first city visited?' , 'two cities we love?'],
'answer1': ['UK', 'Paris', 'CA'],
'answer2': ['US', 0.4, 'Paris'],
'answer3': ['CA', 'London', 'London'],
'correct': [['UK'], [0.4], ['London, Paris, 0.4']]
})
資料:
id question answer1 answer2 answer3 correct
0 1 first country visited? UK US CA [UK]
1 2 first city visited? Paris 0.4 London [0.4]
2 3 two cities we love? CA Paris London [London, Paris, 0.4]
我正在創建一個新列來檢查是否在 answer1 或 answer2 或 answer3 列中找到了正確列中的值。
cols = data.filter(like='answer').columns
data['correct_column'] = data[cols].apply(lambda s: ','.join((m:=s.isin(data.loc[s.name, 'correct']))[m].index), axis=1)
輸出:
id question answer1 answer2 answer3 correct correct_column
0 1 first country visited? UK US CA [UK] answer1
1 2 first city visited? Paris 0.4 London [0.4 answer2
2 3 two cities we love? CA Paris London [London, Paris, 0.4]
我在第三行得到一個空值。我已經在原始資料上嘗試了幾個小時但沒有成功!有沒有更好的方法來實作這一目標?考慮到我原來的 df 中的不同資料型別,比如 floats、int & Str ..
uj5u.com熱心網友回復:
這是一個更長的版本:
cols = data.filter(like='answer').columns
def app(s):
(m:=[s[col] in (data.loc[s.name, 'correct']) for col in cols])
return ', '.join(cols[m])
data['correct_column'] = data[cols].apply(app, axis=1)
data['correct_column']
和更短的版本將完成同樣的事情:
data['correct_column'] = data[cols].apply(lambda s: ', '.join(cols[(m:=[s[col] in (data.loc[s.name, 'correct']) for col in cols])]) , axis=1)
data['correct_column']
這將產生:
0 answer1
1 answer2
2 answer2, answer3
Name: correct_column, dtype: object
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/400510.html
