我一直在學習如何使用 Selenium 創建網路爬蟲。我正在努力解決的一件事是用分頁抓取頁面。我寫了一個腳本,我認為它會刮掉每一頁
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
import getpass
import datetime
import pandas as pd
custom_options = webdriver.ChromeOptions()
custom_options.add_experimental_option('prefs', {'intl.accept_languages': 'en,en_US'})
driver = webdriver.Chrome(ChromeDriverManager().install(), options=custom_options)
driver.get("https://lr.caa.cz/letecky-rejstrik?lang=en")
data =[]
while(True):
try:
table_body = WebDriverWait(driver, 15).until(EC.presence_of_element_located((By.TAG_NAME, "tbody")))
table_body_rows = table_body.find_elements_by_tag_name("tr")
button = WebDriverWait(driver, 15).until(EC.presence_of_element_located((By.XPATH, '/html/body/app-root/div/main/div/div/app-avreg-list/nav/div/app-pagination/div/a[3]/i')))
for i in table_body_rows:
row_data = []
table_data = i.find_elements_by_tag_name("td")
for j in table_data:
row_data.append(j.text.strip())
data.append(row_data)
button.click()
except:
break
df = pd.DataFrame(data)
print(df)
driver.quit()
它會刮掉第一頁,但似乎并沒有超出這個范圍。這是我得到的結果:
0 1 2 3
0 Glider MDM-1 FOX OK-1213
1 Glider MDM-1 FOX OK-7801
2 Glider A 15 OK-7906
3 Powered glider SZD-45A OK-6902
4 Powered glider SZD-45A OK-8903
5 Hot-air balloon AB OK-9004
6 Hot-air balloon AB OK-4012
7 Hot-air balloon AB OK-4014
8 Hot-air balloon AB OK-7006
9 Hot-air balloon AB OK-7004
10 None None None
我查看了網站上分頁按鈕的 xpath,它在腳本中似乎是正確的。
有什么想法可能是錯的嗎?
uj5u.com熱心網友回復:
而不是presence_of_element_located()使用element_to_be_clickable()并遵循 css 選擇器或 xpath 來識別元素。
button = WebDriverWait(driver, 15).until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'app-pagination a:nth-of-type(3)')))
要么
button = WebDriverWait(driver, 15).until(EC.element_to_be_clickable((By.XPATH, "(//app-pagination//a)[3]")))
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/442318.html
上一篇:Selenium-單擊頁碼更改頁面但不重新加載/填充資料
下一篇:無法找到第三個div元素
