我正在嘗試從 html 頁面中抓取一些專案。并且必須從下拉串列中選擇選項,然后進行迭代。但我總是從下拉串列中的第一個選項中獲取專案。我猜是因為我的點擊功能無法正常作業。如何遍歷所有選項并選擇專案來創建資料
import pandas as pd
from selenium import webdriver
import re
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time
service = Service("/home/ubuntu/selenium_drivers/chromedriver")
base_url = 'https://www.crave.ca/en/tv-shows/16-and-pregnant'
page_one = True
options = webdriver.ChromeOptions()
options.add_argument("--headless")
driver = webdriver.Chrome(service=service, options=options)
driver.get(base_url)
driver.find_element(By.XPATH,'//*[@id="dropdown-basic"]').click()
time.sleep(5)
total_seasons = driver.find_elements(By.CSS_SELECTOR,'button.dropdown-item')
driver.find_element(By.XPATH,'//*[@id="dropdown-basic"]').click()
print(len(total_seasons))
d=[]
for i in range(0,len(total_seasons)):
alleps = driver.find_elements(By.XPATH,'//*[@id="episodes"]/div/ul/li')
for j in range(1,len(alleps) 1):
d.append({
'Duration ': driver.find_element(By.XPATH,f'//*[@id="episodes"]/div/ul/li[{j}]/div[1]/div[2]/span/span[1]').text,
'Episode_Number ': j,
'Episode_Synopsis ': driver.find_element(By.XPATH,f'//*[@id="episodes"]/div/ul/li[{j}]/div[1]/div[2]/p').text,
'Episode_Title ': re.sub(r'[^a-zA-Z ] ','',driver.find_element(By.XPATH,f'//*[@id="episodes"]/div/ul/li[{j}]/div[1]/div[2]/h3').text).strip(),
})
data = pd.DataFrame.from_dict(d)
uj5u.com熱心網友回復:
您正在單擊此元素兩次:
driver.find_element(By.XPATH,'//*[@id="dropdown-basic"]').click()
因此,您正在打開下拉選單并將其關閉。你永遠不會選擇其他季節。
為了使您的代碼更好地作業,您應該首先抓取 Season1 資料而不選擇其他季節,然后遍歷其他季節,逐個選擇它們并抓取它們的資料。
您的代碼可能是這樣的:
import pandas as pd
from selenium import webdriver
import re
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time
service = Service("/home/ubuntu/selenium_drivers/chromedriver")
base_url = 'https://www.crave.ca/en/tv-shows/16-and-pregnant'
page_one = True
options = webdriver.ChromeOptions()
options.add_argument("--headless")
driver = webdriver.Chrome(service=service, options=options)
driver.get(base_url)
driver.find_element(By.XPATH,'//*[@id="dropdown-basic"]').click()
time.sleep(1)
total_seasons = driver.find_elements(By.CSS_SELECTOR,'button.dropdown-item')
driver.find_element(By.XPATH,'//*[@id="dropdown-basic"]').click()
print(len(total_seasons))
d=[]
for i in range(len(total_seasons)):
alleps = driver.find_elements(By.XPATH,'//*[@id="episodes"]/div/ul/li')
for j in range(1,len(alleps) 1):
d.append({
'Duration ': driver.find_element(By.XPATH,f'//*[@id="episodes"]/div/ul/li[{j}]/div[1]/div[2]/span/span[1]').text,
'Episode_Number ': j,
'Episode_Synopsis ': driver.find_element(By.XPATH,f'//*[@id="episodes"]/div/ul/li[{j}]/div[1]/div[2]/p').text,
'Episode_Title ': re.sub(r'[^a-zA-Z ] ','',driver.find_element(By.XPATH,f'//*[@id="episodes"]/div/ul/li[{j}]/div[1]/div[2]/h3').text).strip(),
})
driver.find_element(By.XPATH,'//*[@id="dropdown-basic"]').click()
seasons = driver.find_elements(By.CSS_SELECTOR,'button.dropdown-item')
seasons[i].click()
time.sleep(1)
data = pd.DataFrame.from_dict(d)
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/406769.html
標籤:
下一篇:單擊按鈕更改日期
