從帶有targetblank標記的資料框列中洗掉url-有解無憂

我想從資料框中的列中洗掉 url。我感興趣的列稱為評論，評論中的示例條目是：

|comment                                 |
|:--------------------------------------:|
| """Drone Strikes Up 432 Percent Under. |
|Donald Trump"" by Joe Wolverton, II,    |
|J.D.                                    |
|<a                                      |
|href=""https://www.thenewamerican.com/  |
|usne                                    |
|ws/foreign-policy/item/25604-drone-     |
|strikes-up-432-percent-under-donald-    |
|trump""                                 |
|title=""https://www.thenewamerican.com/ |
|usn                                     |
|ews/foreign-policy/item/25604-drone-    |
|strikes-up-432-percent-under-donald-    |
|trump""                                 |
|target=""_blank"">https://www.thenewamer|
|c                                       |
|an.com/usnews/foreign-policy/item/25604-|
|drone-st...</a><br/>""Trump is weighing |
| major escalation in Yemen's devastating| 
|war<br/>The war has already killed at   |
|least 10,000, displaced 3 million, and. | 
|left millions more at risk of famine."" |
|<br/>"                                  |

上面的這個條目顯示了我要解決的問題。我想完全洗掉：

<a href=""https://www.thenewamerican.com/usnews/foreign-policy/item/25604-drone-strikes-up-432-percent-under-donald-trump"" title=""https://www.thenewamerican.com/usnews/foreign-policy/item/25604-drone-strikes-up-432-percent-under-donald-trump"" target=""_blank"">https://www.thenewamerican.com/usnews/foreign-policy/item/25604-drone-st...</a>

我試過了：

df['comment'] = df['comment'].replace(r'https\S ', ' ', regex=True).replace(r'www\S ', ' ', regex=True).replace(r'http\S ', ' ', regex=True)

然而這和我一樣喜歡

href title targetblank com

uj5u.com熱心網友回復：

嘗試：

df['comment'] = df['comment'].str.replace('<a\s[^>]*.*?<\/a>', '')

輸出：

>>> df.loc[0, 'comment']

'Drone Strikes Up 432 Percent Under. Donald Trump"" by Joe Wolverton, II, J.D. <br/>""Trump is weighing  major escalation in Yemen\'s devastating war<br/>The war has already killed at   least 10,000, displaced 3 million, and.  left millions more at risk of famine."" <br/>'

uj5u.com熱心網友回復：

您可以嘗試使用正則運算式執行替換，使用re.sub。

例如：

import re

s = """Drone Strikes Up 432 Percent Under. Donald Trump"" by Joe Wolverton, II, J.D. <a href=""https://www.thenewamerican.com/usnews/foreign-policy/item/25604-drone-strikes-up-432-percent-under-donald-trump"" title=""https://www.thenewamerican.com/usnews/foreign-policy/item/25604-drone-strikes-up-432-percent-under-donald-trump"" target=""_blank"">https://www.thenewamerc an.com/usnews/foreign-policy/item/25604-drone-st...</a><br/>""Trump is weighing  major escalation in Yemen's devastating war<br/>The war has already killed at   least 10,000, displaced 3 million, and.  left millions more at risk of famine."" <br/>"""

print(re.sub('<a\s[^>]*.*?<\/a>', '', s))

在您的情況下，您可以使用.applay來實作您的目標：

import re
import pandas as pd


df = pd.DataFrame({'comment': ["Drone Strikes Up 432 Percent Under. Donald Trump"" by Joe Wolverton, II, J.D. <a href=""https://www.thenewamerican.com/usnews/foreign-policy/item/25604-drone-strikes-up-432-percent-under-donald-trump"" title=""https://www.thenewamerican.com/usnews/foreign-policy/item/25604-drone-strikes-up-432-percent-under-donald-trump"" target=""_blank"">https://www.thenewamerc an.com/usnews/foreign-policy/item/25604-drone-st...</a><br/>""Trump is weighing  major escalation in Yemen's devastating war<br/>The war has already killed at   least 10,000, displaced 3 million, and.  left millions more at risk of famine."" <br/>"""]})

df['comment'] = df['comment'].apply(lambda x: re.sub('<a\s[^>]*.*?<\/a>', '', x))

print(df)

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/420932.html

標籤：

上一篇：將值字串插入串列，其中列的每個單元格中的串列

下一篇：TypeError：“float”物件在邏輯操作中不可迭代