同學們,
我正在做一些網頁抓取,需要從 www1.hkexnews.hk 網站下載多個 PDF。
但是,我在嘗試讓我的 Selenium chromedriver勾選每次想要在所述網站上下載 PDF 時出現的框時遇到了一個問題。代碼執行,但該框仍然未單擊。
請參考我下面的源代碼 - 將不勝感激任何建議!
driver = webdriver.Chrome('/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/chromedriver',options=chrome_options)
driver.implicitly_wait(10)
driver.maximize_window()
start_address = "https://www1.hkexnews.hk/app/appyearlyindex.html?lang=en&board=mainBoard&year=2021"
driver.get(start_address)
PDF_link = driver.find_element_by_xpath("//a[contains(text(),'Full Version')]")
print("Now clicking...'", PDF_link.text,"'")
PDF_link.click()
checkbox = driver.find_element_by_id('warning-statement-accept')
print("Now clicking...", checkbox.text)
checkbox.click
編輯:謝謝各位!現在下載作業正常,只是一個小的后續問題 - 我如何修改下載代碼以根據其公司名稱保存每個 PDF - 可通過all_names = driver.find_elements_by_xpath("//div[@class='applicant-name']")?
目前,我正在使用下面的自動下載選項,我想下載邏輯必須進行調整(我寧愿已經下載具有正確名稱的 PDF,而不是使用使用 Python 更改名稱的骯臟解決方法一旦他們被拯救......)
chrome_options.add_experimental_option('prefs', {
"download.default_directory": "/Users/XXX/Downloads", #Change default directory for downloads
"download.prompt_for_download": False, #To auto download the file
"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True #It will not show PDF directly in chrome
})
uj5u.com熱心網友回復:
這應該這樣做:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = "https://www1.hkexnews.hk/app/appyearlyindex.html?lang=en&board=mainBoard&year=2021"
driver = webdriver.Chrome()
wait = WebDriverWait(driver,10)
driver.get(link)
elem = wait.until(EC.presence_of_element_located((By.XPATH,"//tr[@class='record-ap-phip']//a[contains(.,'Full Version')]")))
elem.click()
wait.until(EC.presence_of_element_located((By.XPATH,"//*[@id='warning-statement-dialog']//label[@for='warning-statement-accept']"))).click()
wait.until(EC.presence_of_element_located((By.XPATH,"//*[@id='warning-statement-dialog']//a[contains(@class,'btn-ok')]"))).click()
這是腳本的修改版本,它將踢出新打開的選項卡。我沒有在腳本中包含下載邏輯。我想你可以自己做。
driver.get(link)
current = driver.current_window_handle
for elem in wait.until(EC.presence_of_all_elements_located((By.XPATH,"//tr[@class='record-ap-phip']//a[contains(.,'Full Version')]"))):
elem.click()
wait.until(EC.presence_of_element_located((By.XPATH,"//*[@id='warning-statement-dialog']//label[@for='warning-statement-accept']"))).click()
wait.until(EC.presence_of_element_located((By.XPATH,"//*[@id='warning-statement-dialog']//a[contains(@class,'btn-ok')]"))).click()
wait.until(EC.new_window_is_opened)
driver.switch_to.window([window for window in driver.window_handles if window != current][0])
print(driver.current_url)
driver.close()
driver.switch_to.window(current)
driver.quit()
uj5u.com熱心網友回復:
這里有幾個問題:
- “復選框”定位器錯誤。
- 您當前的代碼將僅下載第一個 PDF 檔案。
最好使用預期條件顯式等待而不是隱式等待。
這應該會更好:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome('/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/chromedriver',options=chrome_options)
wait = WebDriverWait(driver, 20)
driver.maximize_window()
start_address = "https://www1.hkexnews.hk/app/appyearlyindex.html?lang=en&board=mainBoard&year=2021"
driver.get(start_address)
PDF_link = wait.until(EC.visibility_of_element_located((By.XPATH, "//a[contains(text(),'Full Version')]")))
print("Now clicking...'", PDF_link.text,"'")
PDF_link.click()
checkbox = wait.until(EC.visibility_of_element_located((By.XPATH, "//div[./label[@for='warning-statement-accept']]//input")))
print("Now clicking...", checkbox.text)
checkbox.click
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/407885.html
標籤:
上一篇:設定selenium請求標頭
下一篇:從for回圈動態創建串列名稱
