我想從資料框中的列中洗掉 url。我感興趣的列稱為評論,評論中的示例條目是:
|comment |
|:--------------------------------------:|
| """Drone Strikes Up 432 Percent Under. |
|Donald Trump"" by Joe Wolverton, II, |
|J.D. |
|<a |
|href=""https://www.thenewamerican.com/ |
|usne |
|ws/foreign-policy/item/25604-drone- |
|strikes-up-432-percent-under-donald- |
|trump"" |
|title=""https://www.thenewamerican.com/ |
|usn |
|ews/foreign-policy/item/25604-drone- |
|strikes-up-432-percent-under-donald- |
|trump"" |
|target=""_blank"">https://www.thenewamer|
|c |
|an.com/usnews/foreign-policy/item/25604-|
|drone-st...</a><br/>""Trump is weighing |
| major escalation in Yemen's devastating|
|war<br/>The war has already killed at |
|least 10,000, displaced 3 million, and. |
|left millions more at risk of famine."" |
|<br/>" |
上面的這個條目顯示了我要解決的問題。我想完全洗掉:
<a href=""https://www.thenewamerican.com/usnews/foreign-policy/item/25604-drone-strikes-up-432-percent-under-donald-trump"" title=""https://www.thenewamerican.com/usnews/foreign-policy/item/25604-drone-strikes-up-432-percent-under-donald-trump"" target=""_blank"">https://www.thenewamerican.com/usnews/foreign-policy/item/25604-drone-st...</a>
我試過了:
df['comment'] = df['comment'].replace(r'https\S ', ' ', regex=True).replace(r'www\S ', ' ', regex=True).replace(r'http\S ', ' ', regex=True)
然而這和我一樣喜歡
href title targetblank com
uj5u.com熱心網友回復:
嘗試:
df['comment'] = df['comment'].str.replace('<a\s[^>]*.*?<\/a>', '')
輸出:
>>> df.loc[0, 'comment']
'Drone Strikes Up 432 Percent Under. Donald Trump"" by Joe Wolverton, II, J.D. <br/>""Trump is weighing major escalation in Yemen\'s devastating war<br/>The war has already killed at least 10,000, displaced 3 million, and. left millions more at risk of famine."" <br/>'
uj5u.com熱心網友回復:
您可以嘗試使用正則運算式執行替換,使用re.sub。
例如:
import re
s = """Drone Strikes Up 432 Percent Under. Donald Trump"" by Joe Wolverton, II, J.D. <a href=""https://www.thenewamerican.com/usnews/foreign-policy/item/25604-drone-strikes-up-432-percent-under-donald-trump"" title=""https://www.thenewamerican.com/usnews/foreign-policy/item/25604-drone-strikes-up-432-percent-under-donald-trump"" target=""_blank"">https://www.thenewamerc an.com/usnews/foreign-policy/item/25604-drone-st...</a><br/>""Trump is weighing major escalation in Yemen's devastating war<br/>The war has already killed at least 10,000, displaced 3 million, and. left millions more at risk of famine."" <br/>"""
print(re.sub('<a\s[^>]*.*?<\/a>', '', s))
在您的情況下,您可以使用.applay來實作您的目標:
import re
import pandas as pd
df = pd.DataFrame({'comment': ["Drone Strikes Up 432 Percent Under. Donald Trump"" by Joe Wolverton, II, J.D. <a href=""https://www.thenewamerican.com/usnews/foreign-policy/item/25604-drone-strikes-up-432-percent-under-donald-trump"" title=""https://www.thenewamerican.com/usnews/foreign-policy/item/25604-drone-strikes-up-432-percent-under-donald-trump"" target=""_blank"">https://www.thenewamerc an.com/usnews/foreign-policy/item/25604-drone-st...</a><br/>""Trump is weighing major escalation in Yemen's devastating war<br/>The war has already killed at least 10,000, displaced 3 million, and. left millions more at risk of famine."" <br/>"""]})
df['comment'] = df['comment'].apply(lambda x: re.sub('<a\s[^>]*.*?<\/a>', '', x))
print(df)
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/420932.html
標籤:
