我目前正在嘗試使用帶有“顯示更多”按鈕的網站使用 python、漂亮的湯和硒來抓取新聞頭條。我能夠成功地用 selenium 加載頁面,單擊按鈕以顯示更多標題,然后列印出標題,所有這些都沒有錯誤訊息。我的問題是單擊“顯示更多”按鈕后,Beautiful Soup 沒有讀取驅動程式的內容。它只是在單擊按鈕之前閱讀頁面上最初的標題。我如何才能使標題僅在單擊“顯示更多”按鈕一定次數后才被讀取并列印出來?我有一個 for 回圈而不是 while 回圈,所以我可以單擊按鈕 n 次。
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time
from bs4 import BeautifulSoup
s=Service('/Users/comp/Desktop/chromedriver')
driver= webdriver.Chrome(service=s)
url='https://www.foxnews.com/politics'
driver.get(url)
for x in range(10):
try:
loadMoreButton = driver.find_element(By.XPATH, "/html/body/div[2]/div/div/div/div[2]/div/main/section[4]/footer/div/a")
time.sleep(3)
loadMoreButton.click()
time.sleep(3)
except Exception as e:
print(e)
break
time.sleep(3)
soup = BeautifulSoup(driver.page_source, 'lxml')
headlines = soup.find('body').find_all('h4')
for x in headlines:
print(x.text.strip())
time.sleep(3)
driver.quit()
uj5u.com熱心網友回復:
試試下面的代碼。現在它正在作業
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time
from bs4 import BeautifulSoup
s=Service('./chromedriver')
driver= webdriver.Chrome(service=s)
url='https://www.foxnews.com/politics'
driver.get(url)
time.sleep(3)
for x in range(10):
try:
soup = BeautifulSoup(driver.page_source, 'lxml')
headlines = soup.find('body').find_all('h4')
for x in headlines:
print(x.text.strip())
loadMoreButton = driver.find_element(By.XPATH, "/html/body/div[2]/div/div/div/div[2]/div/main/section[4]/footer/div/a")
if loadMoreButton:
loadMoreButton.click()
time.sleep(3)
except Exception as e:
print(e)
break
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/526362.html
