我真的在這個案子上苦苦掙扎,整天都在努力。請我需要你的幫助。我正在嘗試抓取這個網頁: https ://decisions.scc-csc.ca/scc-csc/en/d/s/index.do?cont =&ref=&d1=2012-01- 01&d2=2022-01-31&p=&col=1&su=16&or= 我想獲取所有137個href-s(137個檔案)。我使用的代碼:
def test(self):
final_url = 'https://decisions.scc-csc.ca/scc-csc/en/d/s/index.do?cont=&ref=&d1=2012-01-01&d2=2022-01-31&p=&col=1&su=16&or='
self.driver.get(final_url)
soup = BeautifulSoup(self.driver.page_source, 'html.parser')
iframes = soup.find('iframe')
src = iframes['src']
base = 'https://decisions.scc-csc.ca/'
main_url = urljoin(base, src)
self.driver.get((main_url))
browser = self.driver
elem = browser.find_element_by_tag_name("body")
no_of_pagedowns = 20
while no_of_pagedowns:
elem.send_keys(Keys.PAGE_DOWN)
time.sleep(0.2)
no_of_pagedowns -= 1
問題是它只加載了 25 個第一個檔案(href)并且不知道該怎么做。
uj5u.com熱心網友回復:
此代碼向下滾動,直到所有元素都可見,然后將 pdf 的 url 保存在 list 中pdfs。請注意,所有作業都是使用 selenium 完成的,沒有使用 BeautifulSoup
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(options=options, service=Service(your_chromedriver_path))
driver.get('https://decisions.scc-csc.ca/scc-csc/en/d/s/index.do?cont=&ref=&d1=2012-01-01&d2=2022-01-31&p=&col=1&su=16&or=')
# wait for the iframe to be loaded and then switch to it
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.ID, "decisia-iframe")))
# in this case number_of_results = 137
number_of_results = int(driver.find_element(By.XPATH, "//h2[contains(., 'result')]").text.split()[0])
pdfs = []
while len(pdfs) < number_of_results:
pdfs = driver.find_elements(By.CSS_SELECTOR, 'a[title="Download the PDF version"]')
# scroll down to the last visible row
driver.execute_script('arguments[0].scrollIntoView({block: "center", behavior: "smooth"});', pdfs[-1])
time.sleep(1)
pdfs = [pdf.get_attribute('href') for pdf in pdfs]
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/479396.html
上一篇:Seleniumpython:find_elements_by_tag_name和回圈作業但不是find_element_by_xpath
