我正在嘗試提取本網站鏈接中的所有描述:https ://www.sciencedirect.com/browse/journals-and-books?accessType=openAccess&accessType=containsOpenAccess
我嘗試了 BeautifulSoup 和 Selenium,但我無法提取任何東西。您可以在下圖中看到我得到的 結果
這是我正在使用的代碼
options = Options()
options.add_argument("headless")
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
ul = driver.find_element(By.ID, "publication-list")
print("Links")
allLi = ul.find_elements(By.TAG_NAME, "li")
for li in allLi:
print("Links " str(count) " " li.text)
uj5u.com熱心網友回復:
你錯過了等待。
在訪問它們之前,您必須等待元素變得可見。
最好的方法是使用WebDriverWait expected_conditions顯式等待。
以下代碼有效
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
webdriver_service = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(options=options, service=webdriver_service)
wait = WebDriverWait(driver, 20)
url = "https://www.sciencedirect.com/browse/journals-and-books?accessType=openAccess&accessType=containsOpenAccess"
driver.get(url)
ul = wait.until(EC.visibility_of_element_located((By.ID, "publication-list")))
allLi = wait.until(EC.presence_of_all_elements_located((By.TAG_NAME, "li")))
print(len(allLi))
輸出是:
167
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/533262.html
