我抓取每日陣容,并且需要找出一個團隊是否沒有發布它的陣容。在這種情況下,有一個名為 lineup__no 的類元素。我想查看每個團隊并檢查是否發布了陣容,如果沒有,則將該團隊索引添加到串列中。例如,如果有 4 支球隊在比賽,并且第一和第三隊沒有發布陣容,我想回傳 [0,2] 的串列。我猜某種串列理解可能會幫助我到達那里,但很難想出我需要的東西。我現在嘗試了一個 for 回圈來獲取主標題下的每個專案。我也嘗試將每個 li 專案的文本添加到串列中并搜索“未知陣容”,但沒有成功。
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from bs4 import BeautifulSoup
import requests
import pandas as pd
#Scraping lineups for updates
url = 'https://www.rotowire.com/baseball/daily-lineups.php'
##Requests rotowire HTML
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
games = soup.select('.lineup.is-mlb')
for game in games:
initial_list = game.find_all('li')
print(initial_list)
uj5u.com熱心網友回復:
因為我更熟悉 Selenium,所以我會給你 Selenium 解決方案。
請在作為注釋給出的代碼中查看我的解釋。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
driver = webdriver.Chrome()
driver.maximize_window()
wait = WebDriverWait(driver, 20)
driver.get("https://www.rotowire.com/baseball/daily-lineups.php")
#wait for at least 1 game element to be visible
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".lineup.is-mlb")))
#add a short delay so that all the other games are loaded
time.sleep(0.5)
#get all the games blocks
games = driver.find_elements(By.CSS_SELECTOR,".lineup.is-mlb")
#iterate over the games elements with their indexes in a list comprehension
no_lineup = [j for idx, game in enumerate(games) for j in [idx*2, idx*2 1] if game.find_elements(By.XPATH, ".//li[@class='lineup__no']")]
#print the collected results
print(no_lineup)
#quit the driver
driver.quit()
uj5u.com熱心網友回復:
只需在帶有 的<li>標簽下查看。然后enumerate在迭代時用于跟蹤串列的索引。我沒有一些球隊有陣容的例子(我必須稍后檢查,因為陣容在這里填充),所以我可能會改變邏輯if lineupStatus.text.strip() == 'Unknown Lineup'以更健壯。但在我能確切地看到 html 在這一點上的樣子之前,我必須假設“lineup__no”類總是存在的。但就像我說的,一旦我看到這個頁面的一些陣容,我會調整它。
順便一提,
The Guardians lineup has not been posted yet.
把我扔在那里一秒鐘......完全忘記了這一點!
from bs4 import BeautifulSoup
import requests
import re
#Scraping lineups for updates
url = 'https://www.rotowire.com/baseball/daily-lineups.php'
##Requests rotowire HTML
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
lineupStatuses = soup.find_all('li', {'class':re.compile('^lineup__status')})
noLineupIndex = []
for idx, lineupStatus in enumerate(lineupStatuses):
if 'is-confirmed' not in lineupStatus['class']:
noLineupIndex.append(idx)
# Or use list comprehension
#noLineupIndex = [idx for idx, lineupStatus in enumerate(lineupStatuses) if 'is-confirmed' not in lineupStatus['class']]
輸出:
print(noLineupIndex)
[0, 3, 6, 7, 10, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/448290.html
上一篇:無法從HTML代碼中洗掉價格
