我目前正在嘗試從https://7news.com.au/news/coronavirus-sa抓取新聞文章的標題。
在我發現所有標題都在 h2 類下后,我撰寫了以下代碼:
import requests
from bs4 import BeautifulSoup
url = f'https://7news.com.au/news/coronavirus-sa'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
titles = soup.find('body').find_all('h2')
for i in titles:
print(i.text.strip())
這段代碼的結果是:
News
Discover
Connect
SA COVID cases surge into triple digit figures for first time
Massive headaches at South Australian testing clinics as COVID cases surge
Revellers forced into isolation after SA teen goes clubbing while infectious with COVID
COVID scare hits Ashes Test in Adelaide after two media members test positive
SA to ease restrictions despite record number of COVID cases
‘We’re going to have cases every day’: SA records biggest COVID spike in 18 MONTHS
Fears for Adelaide nursing homes after COVID infections creep detected
Families in pre-Christmas quarantine after COVID alert for Adelaide school
South Australia records a JUMP in new COVID-19 cases - including infections in children
‘LOCK IT IN’: Mark McGowan to reveal date of WA’s long-awaited reopening to Australia
BOOSTER BOOST-UP: Australia makes change to COVID-19 vaccinations amid Omicron concern
Frydenberg calls for Aussies to ‘keep calm and carry on’ in the face of COVID-19 Omicron strain
News Just In
Our Network
Our Partners
Connect with 7NEWS
其中包含不必要的文本,例如“新聞”、“發現”和“剛剛進入的新聞”。
發生這種情況是因為這些文本也屬于 h2 類。因此,我添加了以下代碼以從結果中洗掉它們:
soup.find('h2', id='css-1oh2gv-StyledHeading.e1fp214b7').decompose()
結果是屬性錯誤。
AttributeError: 'NoneType' object has no attribute 'decompose'
我也嘗試了 clear() 方法,但它沒有給出我想要的結果。
有沒有另一種方法可以洗掉不需要的文本?
uj5u.com熱心網友回復:
怎么了?
您的選擇太籠統了,因為它選擇了所有<h2>并且不需要.decompose()解決問題。
怎么修?
選擇更具體的標題:
soup.select('h2.Card-Headline')
例子
import requests
from bs4 import BeautifulSoup
url = f'https://7news.com.au/news/coronavirus-sa'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
for h2 in soup.select('h2.Card-Headline'):
print(h2.text)
輸出
SA COVID cases surge into triple digit figures for first time
Massive headaches at South Australian testing clinics as COVID cases surge
Revellers forced into isolation after SA teen goes clubbing while infectious with COVID
COVID scare hits Ashes Test in Adelaide after two media members test positive
SA to ease restrictions despite record number of COVID cases
‘We’re going to have cases every day’: SA records biggest COVID spike in 18 MONTHS
Fears for Adelaide nursing homes after COVID infections creep detected
Families in pre-Christmas quarantine after COVID alert for Adelaide school
South Australia records a JUMP in new COVID-19 cases - including infections in children
‘LOCK IT IN’: Mark McGowan to reveal date of WA’s long-awaited reopening to Australia
BOOSTER BOOST-UP: Australia makes change to COVID-19 vaccinations amid Omicron concern
Frydenberg calls for Aussies to ‘keep calm and carry on’ in the face of COVID-19 Omicron strain
只是除了回答問題之外
還要decompose()選擇更具體的選擇 - 但如上所述,沒有必要這樣做:
for i in titles:
if 'Heading' in ' '.join(i['class']):
i.decompose()
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/387384.html
