列名稱與預期的行值不匹配-有解無憂

data = pd.DataFrame({'id':[1,  2 , 3],

                   'question': ['first country visited?', 'first city visited?' , 'two cities we love?'],
                   'answer1': ['UK', 'Paris', 'CA'],
                   'answer2': ['US', 0.4, 'Paris'],
                   'answer3': ['CA', 'London', 'London'],
                   'correct': [['UK'], [0.4], ['London, Paris, 0.4']]
                   })

資料：

    id  question                 answer1    answer2   answer3   correct
0   1   first country visited?      UK       US        CA       [UK]
1   2   first city visited?         Paris   0.4       London    [0.4]
2   3   two cities we love?         CA     Paris      London    [London, Paris, 0.4]

我正在創建一個新列來檢查是否在 answer1 或 answer2 或 answer3 列中找到了正確列中的值。

cols = data.filter(like='answer').columns
data['correct_column'] = data[cols].apply(lambda s: ','.join((m:=s.isin(data.loc[s.name, 'correct']))[m].index), axis=1)

輸出：

id  question                   answer1    answer2   answer3       correct                  correct_column
0   1   first country visited?        UK        US        CA        [UK]                     answer1
1   2   first city visited?           Paris     0.4       London    [0.4                     answer2
2   3   two cities we love?           CA        Paris     London    [London, Paris, 0.4]

我在第三行得到一個空值。我已經在原始資料上嘗試了幾個小時但沒有成功！有沒有更好的方法來實作這一目標？考慮到我原來的 df 中的不同資料型別，比如 floats、int & Str ..

uj5u.com熱心網友回復：

這是一個更長的版本：

cols = data.filter(like='answer').columns

def app(s):
    (m:=[s[col] in (data.loc[s.name, 'correct']) for col in cols])
    return ', '.join(cols[m])

data['correct_column'] = data[cols].apply(app, axis=1)
data['correct_column']

和更短的版本將完成同樣的事情：

data['correct_column'] = data[cols].apply(lambda s: ', '.join(cols[(m:=[s[col] in (data.loc[s.name, 'correct']) for col in cols])]) , axis=1)
data['correct_column']

這將產生：

0             answer1
1             answer2
2    answer2, answer3
Name: correct_column, dtype: object

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/400510.html

標籤：Python 蟒蛇-3.x 熊猫数据框麻木的

上一篇：使用IQR方法去除例外值不會改變資料框的形狀

下一篇：在任意位置對ndarray中的多行進行矢量化切片