我正在嘗試從多個頁面進行網頁抓取,我的代碼似乎只適用于第一頁,當我使用回圈進行網頁抓取時,例如前 5 頁,然后我得到以下錯誤:TimeoutException:訊息:堆疊跟蹤:回溯:
我的代碼如下
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import pandas as pd
from bs4 import BeautifulSoup
import requests as r
import time
from selenium.webdriver.support.ui import Select
PATH="chromedriver.exe"
driver=webdriver.Chrome(PATH)
_list=[]
for page_num in range(1,3):
#print("----")
url=f"https://valuebuds.com/pages/search-results-page?tab=products&page={page_num}"
driver.get(url)
Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "select#year_field")))).select_by_visible_text('1999')
driver.find_element_by_class_name("agree").click()
title=driver.find_elements_by_class_name("snize-overhidden")
for j in title:
Pro=j.find_element_by_class_name("snize-title").text
Price=j.find_element_by_class_name("snize-price-list").text
Desc=j.find_element_by_class_name("snize-description").text
prec_item={
"Product":Pro,
"Price":Price,
"Description":Desc
}
_list.append(prec_item)
df = pd.DataFrame(_list)
df.to_csv("Value Buds HTML Pricing.csv")
print("saved to file.")
請指教!提前致謝
uj5u.com熱心網友回復:
代碼塊
Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "select#year_field")))).select_by_visible_text('1999')
driver.find_element_by_class_name("agree").click()
當您第一次登陸主頁時相關。
一旦您選擇了年份并單擊Agree按鈕,您將能夠看到所有顯示結果的頁面,而無需再次選擇該年份。
因此,您的代碼可能是這樣的:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import pandas as pd
from bs4 import BeautifulSoup
import requests as r
import time
from selenium.webdriver.support.ui import Select
PATH="chromedriver.exe"
driver=webdriver.Chrome(PATH)
_list=[]
for page_num in range(1,3):
#print("----")
url=f"https://valuebuds.com/pages/search-results-page?tab=products&page={page_num}"
driver.get(url)
if page_num == 1:
Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "select#year_field")))).select_by_visible_text('1999')
driver.find_element_by_class_name("agree").click()
else:
time.sleep(2)
title=driver.find_elements_by_class_name("snize-overhidden")
for j in title:
Pro=j.find_element_by_class_name("snize-title").text
Price=j.find_element_by_class_name("snize-price-list").text
Desc=j.find_element_by_class_name("snize-description").text
prec_item={
"Product":Pro,
"Price":Price,
"Description":Desc
}
_list.append(prec_item)
df = pd.DataFrame(_list)
df.to_csv("Value Buds HTML Pricing.csv")
print("saved to file.")
我為非第一次迭代添加了延遲,以便在您抓取頁面資料之前加載頁面。
如果您在那里使用預期條件顯式等待,我會更好。
我不知道在那里使用什么條件,留給你決定。
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/433120.html
上一篇:如何在Linux上使用python使用Selenium上傳檔案(影像)?
下一篇:selenium.common.exceptions.ElementClickInterceptedException:訊息:元素單擊攔截錯誤單擊使用SeleniumPython的單選按鈕
