我正在嘗試創建一個可以抓取某些電子商務網站的應用程式。我為此目的使用 Selenium 并嘗試在運行 centos 的 ec2 實體上部署我的應用程式。在部署之前,我在本地開發了我的代碼并且它可以作業,但是它在遠程機器上給了我錯誤。
我正在使用的代碼
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
ser = Service(ChromeDriverManager().install())
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless")
selenium_driver = webdriver.Chrome(service=ser, options=chrome_options)
url = 'https://www.everlane.com/products/womens-cloud-cable-knit-vest-oatmeal?collection=womens-newest-arrivals'
selenium_driver.get(url)
title = selenium_driver.find_element(By.XPATH, '//*[@id="content"]/div/div[3]/div[2]/div/div/div/div[2]/div/div[1]/hgroup/h1/span')
print(title.text)
當我嘗試在遠程機器上運行此代碼時,出現以下堆疊跟蹤錯誤
Traceback (most recent call last):
File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 2091, in __call__
return self.wsgi_app(environ, start_response)
File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 2076, in wsgi_app
response = self.handle_exception(e)
File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 2073, in wsgi_app
response = self.full_dispatch_request()
File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 1518, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 1516, in full_dispatch_request
rv = self.dispatch_request()
File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 1502, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
File "/home/ec2-user/price_tracker/flask_api.py", line 22, in home
title, price, isSizeAvailable, shop = prices.checkInfoByShop(url, size)
File "/home/ec2-user/price_tracker/check_prices.py", line 132, in checkInfoByShop
secondaryPriceXPath=secondaryPriceXPath)
File "/home/ec2-user/price_tracker/check_prices.py", line 61, in checkSelenium
title = self.selenium_driver.find_element(By.XPATH, titleXPath)
File "/home/ec2-user/.local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 1246, in find_element
'value': value})['value']
File "/home/ec2-user/.local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 424, in execute
self.error_handler.check_response(response)
File "/home/ec2-user/.local/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="content"]/div/div[3]/div[2]/div/div/div/div[2]/div/div[1]/hgroup/h1/span"}
(Session info: headless chrome=96.0.4664.110)
Stacktrace:
#0 0x559979e8dee3 <unknown>
#1 0x55997995b608 <unknown>
#2 0x559979991aa1 <unknown>
#3 0x559979991c61 <unknown>
#4 0x5599799c4714 <unknown>
#5 0x5599799af29d <unknown>
#6 0x5599799c23bc <unknown>
#7 0x5599799af163 <unknown>
#8 0x559979984bfc <unknown>
#9 0x559979985c05 <unknown>
#10 0x559979ebfbaa <unknown>
#11 0x559979ed5651 <unknown>
#12 0x559979ec0b05 <unknown>
#13 0x559979ed6a68 <unknown>
#14 0x559979eb505f <unknown>
#15 0x559979ef1818 <unknown>
#16 0x559979ef1998 <unknown>
#17 0x559979f0ceed <unknown>
#18 0x7ff5dd53b40b <unknown>
出于除錯目的,我嘗試使用
body = selenium_driver.find_element(By.XPATH, '/html/body')
print(body.text)
回傳
"We're sorry, something has gone wrong. Please try again.\nIf you continue to have trouble, please contact us at [email protected].\nChecking your browser before accessing www.everlane.com.\nThis process is automatic. Your browser will redirect to your requested content shortly.\nPlease allow up to 5 seconds…\nDebugging Information\nIP Address\n<ip-address>\nRay ID\n6c57184d797805a0"
我知道我的請求可能因某種原因被阻止,但有沒有辦法繞過這個?
我嘗試添加等待陳述句以希望登陸重定向,但到目前為止沒有任何效果。
uj5u.com熱心網友回復:
該訊息看起來頁面內容已更改。所以你的代碼按預期作業。我會讓 Selenium 等待一個元素可見(在這里閱讀更多)。如果您不想這樣做,您也可以等待頁面重定向。如何做到這一點在此處的另一個 SO 問題中得到了解答。
uj5u.com熱心網友回復:
因為訊息
Checking your browser before accessing www.everlane.com.\nThis process is automatic. Your browser will redirect to your requested content shortly.
似乎該站點啟用了云票價保護。
請參閱參考:https : //thegeekpage.com/how-to-fix-checking-your-browser-before-accessing-message/
我建議嘗試 selenium-stealth
https://pypi.org/project/selenium-stealth/
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium_stealth import stealth
ser = Service(ChromeDriverManager().install())
options = webdriver.ChromeOptions()
options.add_argument("--headless")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(service=ser, options=options)
stealth(driver,
languages=["en-US", "en"],
vendor="Google Inc.",
platform="Win32",
webgl_vendor="Intel Inc.",
renderer="Intel Iris OpenGL Engine",
fix_hairline=True,
)
url = 'https://www.everlane.com/products/womens-cloud-cable-knit-vest-oatmeal?collection=womens-newest-arrivals'
driver.get(url)
title = selenium_driver.find_element(By.XPATH, '//*[@id="content"]/div/div[3]/div[2]/div/div/div/div[2]/div/div[1]/hgroup/h1/span')
print(title.text)
此外,其中一些存盤庫可能會有所幫助:
- https://github.com/ultrafunkamsterdam/undetected-chromedriver
- https://github.com/VeNoMouS/cloudscraper
- https://github.com/unixfox/pupflare
或者看看這個話題:
https://github.com/topics/cloudflare-bypass
uj5u.com熱心網友回復:
我建議使用 webdriver 等待頁面加載。
wait=WebDriverWait(driver,selenium_driver)
elem=wait.until(EC.visibility_of_element_located((By.XPATH,"//*[@id="content"]/div/div[3]/div[2]/div/div/div/div[2]/div/div[1]/hgroup/h1/span")))
print(elem.text)
進口:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/398350.html
