所以我試圖從網站上幾百頁的表格中抓取資料。這是我到目前為止的一部分:
driver.get("link")
driver.maximize_window()
window_before = driver.window_handles[0]
driver.switch_to.window(window_before)
wait = WebDriverWait(driver, 10)
driver.execute_script("window.scrollTo(0, 350)")
games = driver.find_elements(By.XPATH, '//*[@id="schedule"]/tbody/tr')
此代碼僅有時有效。如果我運行這個塊 10 次,網站實際上只會向下滾動 5 次。我嘗試使用這個:
for i in range(0, 2): driver.find_element(By.XPATH, '//*[@id="meta"]/div[1]/p[1]/a').send_keys(Keys.DOWN)
但同樣的問題出現了。有時會向下滾動我需要的數量,有時它什么也不做,有時它會滾動整個頁面。
我的這部分代碼導航到我需要單擊的第一個鏈接,在下一頁上,我需要滾動另一個存在相同問題的頁面。這是遍歷數百頁以讀取 html 表的回圈的一部分,因此即使它在前 50 次有效,我也不會獲得所需的所有資料。
編輯:直接在上面的片段之后我有這個:
for idx, game in enumerate(games):
driver.find_element(By.XPATH, '/html/body/div[2]/div[6]/div[3]/div[2]/table/tbody/tr[' str(idx 1) ']/td[6]/a').click()
這是我得到“元素在點(X,Y)處不可點擊”錯誤的地方。
我在這里做錯了什么,還是有辦法實作我的目標?
uj5u.com熱心網友回復:
這是從該頁面訪問href每個“Box Score”鏈接的屬性的一種方法(根據OP在評論中的澄清):
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('disable-notifications')
chrome_options.add_argument("window-size=1280,720")
webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
wait = WebDriverWait(browser, 20)
actions = ActionChains(browser)
url = 'https://www.basketball-reference.com/leagues/NBA_2014_games-october.html'
browser.get(url)
# print(browser.page_source)
# browser.maximize_window()
try:
wait.until(EC.element_to_be_clickable((By.XPATH, '//div[@]'))).click()
print('clicked cookie parent')
wait.until(EC.element_to_be_clickable((By.XPATH, '//button[@mode="primary"]'))).click()
print('accepted cookies')
except Exception as e:
print('no cookies')
wait.until(EC.element_to_be_clickable((By.XPATH, '//div[@id="all_schedule"]'))).location_once_scrolled_into_view
table_with_score_links = wait.until(EC.presence_of_element_located((By.XPATH, '//table[@id="schedule"]')))
# print(table_with_score_links.get_attribute('outerHTML'))
links_from_table = [x.get_attribute('href') for x in table_with_score_links.find_elements(By.TAG_NAME, 'a') if x.text == 'Box Score']
print(links_from_table)
終端列印的結果:
clicked cookie parent
accepted cookies
['https://www.basketball-reference.com/boxscores/201310290IND.html', 'https://www.basketball-reference.com/boxscores/201310290MIA.html', 'https://www.basketball-reference.com/boxscores/201310290LAL.html', 'https://www.basketball-reference.com/boxscores/201310300CLE.html', 'https://www.basketball-reference.com/boxscores/201310300TOR.html', 'https://www.basketball-reference.com/boxscores/201310300PHI.html', 'https://www.basketball-reference.com/boxscores/201310300DET.html', 'https://www.basketball-reference.com/boxscores/201310300NYK.html', 'https://www.basketball-reference.com/boxscores/201310300NOP.html', 'https://www.basketball-reference.com/boxscores/201310300MIN.html', 'https://www.basketball-reference.com/boxscores/201310300HOU.html', 'https://www.basketball-reference.com/boxscores/201310300SAS.html', 'https://www.basketball-reference.com/boxscores/201310300DAL.html', 'https://www.basketball-reference.com/boxscores/201310300UTA.html', 'https://www.basketball-reference.com/boxscores/201310300PHO.html', 'https://www.basketball-reference.com/boxscores/201310300SAC.html', 'https://www.basketball-reference.com/boxscores/201310300GSW.html', 'https://www.basketball-reference.com/boxscores/201310310CHI.html', 'https://www.basketball-reference.com/boxscores/201310310LAC.html']
我試圖使變數名稱盡可能具有描述性,并且還留下了一些注釋掉的代碼行,以幫助思考程序 - 建立以達到最終目標。
您現在可以一一瀏覽這些鏈接,等等。
硒檔案可以在這里找到:https ://www.selenium.dev/documentation/
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/514460.html
