我嘗試使用 Python 抓取網站并從表中獲取值。這一切順利,直到我只想抓住價值(所以沒有html)。
我嘗試使用以下代碼從欄位中獲取值:
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
import requests
req = Request('https://www.formula1.com/en/results.html/2022/drivers.html', headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
soup = BeautifulSoup(webpage,'html.parser')
drivers = soup.find('table',class_='resultsarchive-table').find_all('tr')
for driver in drivers:
rank = driver.find('td', class_='dark')
first = driver.find('span',class_='hide-for-tablet')
last = driver.find('span',class_='hide-for-mobile')
print (rank)
當我使用 .text 或 .get_text() 時,我收到錯誤 AttributeError: 'NoneType' object has no attribute 而上面的代碼包含值。
我做錯了什么?
uj5u.com熱心網友回復:
這里的問題是,您還可以使用不包含任何<td>. 但是你可以簡單地切片它們:
for driver in drivers[1:]:
rank = driver.find('td', class_='dark').text
first = driver.find('span',class_='hide-for-tablet').text
last = driver.find('span',class_='hide-for-mobile').text
print (rank)
或選擇更具體的例如css selectors:
drivers = soup.select('table.resultsarchive-table tr:has(td)')
for driver in drivers:
rank = driver.find('td', class_='dark').text
first = driver.find('span',class_='hide-for-tablet').text
last = driver.find('span',class_='hide-for-mobile').text
print (rank)
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/456927.html
上一篇:使用R(rvest)進行網頁抓取
