我開始學習網路抓取。為了練習,我試圖獲取一個包含此查詢中出現的所有課程名稱的串列:“https://www.udemy.com/courses/search/?src=ukw&q=api python”問題是當我啟動 web 不加載的腳本,最終視窗關閉。我認為 Udemy 可能對自動化具有某種型別的安全性

這是我的代碼:
from selenium import webdriver
import time
website = "https://www.udemy.com/courses/search/?src=ukw&q=api python"
path = "/"
chrome_options = webdriver.ChromeOptions();
chrome_options.add_experimental_option("excludeSwitches", ['enable-logging'])
driver = webdriver.Chrome(options=chrome_options);
driver.get(website)
time.sleep(5)
matches = driver.find_elements_by_tag_name("h3")
uj5u.com熱心網友回復:
udemy網站未完全加載背后的原因可能是由于 Selenium 驅動的ChromeDriver 啟動的Chrome 瀏覽器被檢測為機器人并且進一步的導航被阻止了。
解決方案
逃避檢測的更簡單的方法是添加以下引數:
--disable-blink-features=AutomationControlled
因此,您的代碼塊將是:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
s = Service('C:\\BrowserDrivers\\chromedriver.exe')
driver = webdriver.Chrome(service=s, options=options)
driver.get('https://www.udemy.com/courses/search/?src=ukw&q=api python')
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h1[contains(., 'results for')]")))
driver.save_screenshot("udemy.png")
保存的截圖:

轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/448171.html
