我對報廢資料真的很陌生,我在報廢多個頁面時遇到了麻煩。我正在嘗試獲取一集的標題以及該集的收視率。
我只成功地讓第一頁報廢,然后它就不起作用了。
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
url = 'https://www.imdb.com/title/tt0386676/episodes?season=1'
next_season = "//*[@id='load_next_episodes']"
browser = webdriver.Chrome()
browser.get(url)
for season in range(1,10):
i = 1
episodes = browser.find_elements_by_class_name('info')
for episode in episodes:
title = episode.find_element_by_xpath(f'//*[@id="episodes_content"]/div[2]/div[2]/div[{i}]/div[2]/strong/a').text
rating = episode.find_element_by_class_name('ipl-rating-star__rating').text
print(title, rating)
i = 1
browser.find_element_by_xpath(next_season).click()
browser.close()
我的輸出如下所示:
Pilot 7.4
Diversity Day 8.2
Health Care 7.7
The Alliance 7.9
Basketball 8.3
Hot Girl 7.6
uj5u.com熱心網友回復:
您也無需單擊即可獲得頁面詳細資訊season button。您可以先從 中獲取所有內容season number,dropdown box然后再進行迭代。您可以創建串列并在其中附加資料,然后可以在最后進行迭代,或者可以加載到一個dataframe然后匯出到 CSV 檔案中。
代碼:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
driver = webdriver.Chrome()
driver.get("https://www.imdb.com/title/tt0386676/episodes?season=1")
wait=WebDriverWait(driver,10)
selectSeason=wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#bySeason')))
select=Select(selectSeason)
allSeasons=[option.get_attribute('value') for option in select.options] #get all season numbers
print(allSeasons)
title=[]
ratings=[]
for season in allSeasons:
url="https://www.imdb.com/title/tt0386676/episodes?season={}".format(season)
print(url)
driver.get(url)
for e in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".info"))):
title.append(e.find_element(By.CSS_SELECTOR, "a[itemprop='name']").text)
ratings.append(e.find_element(By.CSS_SELECTOR, ".ipl-rating-star.small .ipl-rating-star__rating").text)
for t , r in zip(title, ratings):
print(t " --- " r)
輸出:

轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/455306.html
標籤:Python 硒 硒网络驱动程序 网页抓取 网络驱动程序等待
上一篇:將OpenLiberty與EclipseLinkJPA提供程式一起使用-javax.persistence.sql-load-script-source未加載
下一篇:OSError:[Errno8]Execformaterror:'/home/ec2-user/Desktop/chromedriver'在AWSEC2ARM風味機器中使用Chro
