我試圖在無頭模式下使用 Selenium chrome 驅動程式抓取一個網頁,但它給我的錯誤也很慢。
當我禁用無頭模式時,它運行得非常快!
我的代碼:
import requests
from fake_useragent import UserAgent
from bs4 import BeautifulSoup, Tag
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support import expected_conditions as EC
import ssl
import time
chrome_options = Options()
chrome_options.add_argument("--window-size=1920,1080")
chrome_options.add_argument("--start-maximized")
chrome_options.headless = True
driver = webdriver.Chrome(executable_path='/Users/sarathc/Desktop/costco/chromedriver', options=chrome_options)
def listResponse(url):
driver.get(url)
time.sleep(0.2)
soup = BeautifulSoup(driver.page_source,"html.parser")
return soup
soup = listResponse("https://www.costco.com.au/Smart-TVs-Audio-Cameras/c/cos_21")
cat = soup.find_all("div", {"class": ["category-node ng-star-inserted"]})
for sk in cat:
print(sk.find("a").get("href"))
錯誤 :
AttributeError: 'NoneType' object has no attribute 'get'
我如何在無頭模式下運行此代碼而不會出錯,并且像沒有無頭模式一樣快得多?
uj5u.com熱心網友回復:
在某些情況下,您需要添加一個 User-Agent 以在無頭模式下獲取頁面源。
代碼片段:-
chrome_options = Options()
chrome_options.headless = True
chrome_options.add_argument("user-agent=Chrome/80.0.3987.132")
chrome_options.add_argument("--window-size=1920,1080")
driver = webdriver.Chrome(ChromeDriverManager().install(),options=chrome_options)
def listResponse(url):
driver.get(url)
time.sleep(0.2)
soup = BeautifulSoup(driver.page_source,"html.parser")
return soup
soup = listResponse("https://www.costco.com.au/Smart-TVs-Audio-Cameras/c/cos_21")
cat = soup.find_all("div", {"class": ["category-node ng-star-inserted"]})
for sk in cat:
print(sk.find("a").get("href"))
而且,chrome_options.add_argument("--start-maximized")當您已經指定了視窗大小時,您不必添加。
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/311606.html
