在另一個元素下搜索類元素-有解無憂

我抓取每日陣容，并且需要找出一個團隊是否沒有發布它的陣容。在這種情況下，有一個名為 lineup__no 的類元素。我想查看每個團隊并檢查是否發布了陣容，如果沒有，則將該團隊索引添加到串列中。例如，如果有 4 支球隊在比賽，并且第一和第三隊沒有發布陣容，我想回傳 [0,2] 的串列。我猜某種串列理解可能會幫助我到達那里，但很難想出我需要的東西。我現在嘗試了一個 for 回圈來獲取主標題下的每個專案。我也嘗試將每個 li 專案的文本添加到串列中并搜索“未知陣容”，但沒有成功。

from selenium import webdriver

from selenium.common.exceptions import NoSuchElementException

from bs4 import BeautifulSoup
import requests
import pandas as pd

#Scraping lineups for updates
url = 'https://www.rotowire.com/baseball/daily-lineups.php'

##Requests rotowire HTML
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")

games = soup.select('.lineup.is-mlb')
for game in games:
    initial_list = game.find_all('li')
    print(initial_list)

uj5u.com熱心網友回復：

因為我更熟悉 Selenium，所以我會給你 Selenium 解決方案。
請在作為注釋給出的代碼中查看我的解釋。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

driver = webdriver.Chrome()
driver.maximize_window()
wait = WebDriverWait(driver, 20)
driver.get("https://www.rotowire.com/baseball/daily-lineups.php")
#wait for at least 1 game element to be visible
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".lineup.is-mlb")))
#add a short delay so that all the other games are loaded
time.sleep(0.5)
#get all the games blocks
games = driver.find_elements(By.CSS_SELECTOR,".lineup.is-mlb")
#iterate over the games elements with their indexes in a list comprehension
no_lineup = [j for idx, game in enumerate(games) for j in [idx*2, idx*2 1] if game.find_elements(By.XPATH, ".//li[@class='lineup__no']")] 


#print the collected results
print(no_lineup)
#quit the driver
driver.quit()

uj5u.com熱心網友回復：

只需在帶有的<li>標簽下查看。然后enumerate在迭代時用于跟蹤串列的索引。我沒有一些球隊有陣容的例子（我必須稍后檢查，因為陣容在這里填充），所以我可能會改變邏輯if lineupStatus.text.strip() == 'Unknown Lineup'以更健壯。但在我能確切地看到 html 在這一點上的樣子之前，我必須假設“lineup__no”類總是存在的。但就像我說的，一旦我看到這個頁面的一些陣容，我會調整它。

順便一提，

The Guardians lineup has not been posted yet.

把我扔在那里一秒鐘......完全忘記了這一點！

from bs4 import BeautifulSoup
import requests
import re

#Scraping lineups for updates
url = 'https://www.rotowire.com/baseball/daily-lineups.php'

##Requests rotowire HTML
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")

lineupStatuses = soup.find_all('li', {'class':re.compile('^lineup__status')})


noLineupIndex = []
for idx, lineupStatus in enumerate(lineupStatuses):
    if 'is-confirmed' not in lineupStatus['class']:
        noLineupIndex.append(idx)
        
# Or use list comprehension        
#noLineupIndex = [idx for idx, lineupStatus in enumerate(lineupStatuses) if 'is-confirmed' not in lineupStatus['class']]

輸出：

print(noLineupIndex)
[0, 3, 6, 7, 10, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/448290.html

標籤：Python 硒网页抓取路径美丽的汤

上一篇：無法從HTML代碼中洗掉價格

下一篇：Seleniumpython函式find_elements_by_css_selector（）不回傳預期資料