所以這個問題之前已經被問過,但我仍然在努力讓它發揮作用。
該網頁有一個帶有鏈接的表格,我想通過單擊每個鏈接進行迭代。

所以這是我到目前為止的代碼
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path=r'C:\Users\my_path\chromedriver_96.exe')
driver.get(r"https://www.fidelity.co.uk/shares/ftse-350/")
try:
element = WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.CLASS_NAME, "table-scroll")))
table = element.find_elements_by_xpath("//table//tbody/tr")
for row in table[1:]:
print(row.get_attribute('innerHTML'))
# link.click()
finally:
driver.close()
輸出樣本
<td>FOUR</td>
<td><a href="/factsheets/4IMPRINT-GROUP/GB0006640972-GBP/?id=GB0006640972GBP&idType=isin&marketCode=&idCurrencyid=" target="_parent">4imprint Group plc</a></td>
<td>Media & Publishing</td>
<td>888</td>
<td><a href="/factsheets/888-HOLDINGS/GI000A0F6407-GBP/?id=GI000A0F6407GBP&idType=isin&marketCode=&idCurrencyid=" target="_parent">888 Holdings</a></td>
<td>Hotels & Entertainment Services</td>
<td>ASL</td>
<td><a href="/factsheets/ABERFORTH-SMALLER-COMPANIES-TRUST/GB0000066554-GBP/?id=GB0000066554GBP&idType=isin&marketCode=&idCurrencyid=" target="_parent">Aberforth Smaller Companies Trust</a></td>
<td>Collective Investments</td>
如何單擊 href 并迭代到下一個 href?
非常感謝。
編輯 我采用了這個解決方案(對 Prophet 的解決方案進行了一些小調整)
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time
from selenium.webdriver.common.action_chains import ActionChains
driver = webdriver.Chrome(executable_path=r'C:\Users\my_path\chromedriver_96.exe')
driver.get(r"https://www.fidelity.co.uk/shares/ftse-350/")
actions = ActionChains(driver)
#close the cookies banner
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.ID, "ensCloseBanner"))).click()
#wait for the first link in the table
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table//tbody/tr/td/a")))
#extra wait to make all the links loaded
time.sleep(1)
#get the total links amount
links = driver.find_elements_by_xpath('//table//tbody/tr/td/a')
for index, val in enumerate(links):
try:
#get the links again after getting back to the initial page in the loop
links = driver.find_elements_by_xpath('//table//tbody/tr/td/a')
#scroll to the n-th link, it may be out of the initially visible area
actions.move_to_element(links[index]).perform()
links[index].click()
#scrape the data on the new page and get back with the following command
driver.execute_script("window.history.go(-1)") #you can alternatevely use this as well: driver.back()
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table//tbody/tr/td/a")))
time.sleep(2)
except StaleElementReferenceException:
pass
uj5u.com熱心網友回復:
要在此處執行您想要執行的操作,您首先需要關閉頁面底部的 cookie 橫幅。
然后您可以遍歷表中的鏈接。
因為通過單擊每個鏈接,您將打開一個新頁面,在嚇跑那里的資料之后,您將不得不回傳主頁并獲取下一個鏈接。您不能只是將所有鏈接放入某個串列,然后遍歷該串列,因為通過導航到另一個網頁,Selenium 在初始頁面上抓取的所有現有元素都會變得陳舊。
您的代碼可能是這樣的:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time
driver = webdriver.Chrome(executable_path=r'C:\Users\my_path\chromedriver_96.exe')
driver.get(r"https://www.fidelity.co.uk/shares/ftse-350/")
actions = ActionChains(driver)
#close the cookies banner
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.ID, "ensCloseBanner"))).click()
#wait for the first link in the table
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table//tbody/tr/td/a")))
#extra wait to make all the links loaded
time.sleep(1)
#get the total links amount
links = driver.find_elements_by_xpath('//table//tbody/tr/td/a')
for index, val in enumerate(links):
#get the links again after getting back to the initial page in the loop
links = driver.find_elements_by_xpath('//table//tbody/tr/td/a')
#scroll to the n-th link, it may be out of the initially visible area
actions.move_to_element(links[index]).perform()
links[index].click()
#scrape the data on the new page and get back with the following command
driver.execute_script("window.history.go(-1)") #you can alternatevely use this as well: driver.back()
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table//tbody/tr/td/a")))
time.sleep(1)
uj5u.com熱心網友回復:
您基本上必須執行以下操作:
- 如果可用,請單擊 cookie 按鈕
- 獲取頁面上的所有鏈接。
- 遍歷鏈接串列,然后單擊第一個(首先滾動到 web 元素并為串列項執行此操作),然后導航回原始螢屏。
代碼:
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
wait = WebDriverWait(driver, 30)
driver.get("https://www.fidelity.co.uk/shares/ftse-350/")
try:
wait.until(EC.element_to_be_clickable((By.ID, "ensCloseBanner"))).click()
print('Click on the cookies button')
except:
print('Could not click on the cookies button')
pass
driver.execute_script("window.scrollTo(0, 750)")
try:
all_links = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//table//tbody/tr/td/a")))
print("We have got to deal with", len(all_links), 'links')
j = 0
for link in range(len(all_links)):
links = wait.until(EC.presence_of_all_elements_located((By.XPATH, f"//table//tbody/tr/td/a")))
driver.execute_script("arguments[0].scrollIntoView(true);", links[j])
time.sleep(1)
links[j].click()
# here write the code to scrape something once the click is performed
time.sleep(1)
driver.execute_script("window.history.go(-1)")
j = j 1
print(j)
except:
print('Bot Could not exceute all the links properly')
pass
進口:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
PS 要處理陳舊的元素參考,您必須在回圈內再次定義 Web 元素串列。
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/409787.html
標籤:
上一篇:網格視圖中的空參考下拉串列
