如果值包含不區分大小寫的字串，則用字串替換資料框列值-有解無憂

我有一個像這樣的資料框：

id      familyHistoryDiabetes
0       YES - Father Diabetic
1       NO FAMILY HISTORY OF DIABETES
2       Yes-Mother & Father have Type 2 Diabetes
3       NO FAMILY HISTORY OF DIABETES

如果字串包含“yes”和“no”，如果字串包含“no”，我想用簡單的“yes”替換列值。

為此，我運行了以下代碼：

df['familyHistoryDiabetes'] = df['familyHistoryDiabetes'].apply(lambda x: 'Yes' if 'Yes' in x else 'No')

運行此程式后，我意識到這會錯過“是”全部大寫的情況：

id      familyHistoryDiabetes
0       No
1       No
2       Yes
3       No

所以我想運行類似的代碼，但在搜索它時忽略“是”的情況。

為此，我嘗試了一種類似于此處提到的使用 casefold()的解決方案，如下所示：

df['familyHistoryDiabetes'] = df['familyHistoryDiabetes'].apply(lambda x: 'Yes' if 'YES'.casefold() in map(str.casefold, x) else 'No')

但這不起作用，因為它導致我的資料框變為：

id      familyHistoryDiabetes
0       No
1       No
2       No
3       No

我可以想象這是一個簡單的修復，但我沒有想法！

謝謝。

uj5u.com熱心網友回復：

與嘗試np.where用contains用case = False

df['new'] = np.where(df['familyHistoryDiabetes'].str.contains('Yes',case = False),
                     'Yes', 
                     'No')

uj5u.com熱心網友回復：

與str.extract：

df["familyHistoryDiabetes" ] = df["familyHistoryDiabetes"].str.lower().str.extract("(yes|no)")

>>> df
   id familyHistoryDiabetes
0   0                   yes
1   1                    no
2   2                   yes
3   3                    no

uj5u.com熱心網友回復：

您可以str.extract與IGNORECASE標志一起使用：

# regex.IGNORECASE = 2
df['new'] = df.familyHistoryDiabetes.str.extract('(Yes)', 2).fillna('No')

輸出：

   id                     familyHistoryDiabetes  new
0   0                     YES - Father Diabetic  YES
1   1             NO FAMILY HISTORY OF DIABETES   No
2   2  Yes-Mother & Father have Type 2 Diabetes  Yes
3   3             NO FAMILY HISTORY OF DIABETES   No

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/370796.html

標籤：Python 熊猫数据框

上一篇：按時間段對資料框進行排序；日期時間64[ns]

下一篇：函式引數遍歷csv檔案值