我正在嘗試抓取https://www.livescore.com/en/但我面臨的問題主要是因為結構與我已經研究過的其他結構不同。
我看到有一個動態 ID 會在向下滾動頁面時增加數字,代碼中的 id 僅與頁面上的可見匹配相關,然后在代碼內部,主隊代碼與客隊相比似乎相同代碼。
這是我嘗試過的作業
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path=r"C:\Users\Lorenzo\Downloads\chromedriver.exe")
driver.maximize_window()
wait=WebDriverWait(driver,30)
driver.get('https://www.livescore.com/en/football/live/')
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button#onetrust-accept-btn-handler"))).click()
games1 = driver.find_elements(By.CSS_SELECTOR, 'div[class = "MatchRow_matchRowWrapper__1BtJ3"]')
data1 = []
for game1 in games1:
data1.append({
'Home':game1.find_element(By.CSS_SELECTOR, 'div[class = "MatchRow_teamName__2cw5n"]').text,
'Away':game1.find_element(By.CSS_SELECTOR, 'div[class = "MatchRow_teamName__2cw5n"]').text,
'Time':game1.find_element(By.CSS_SELECTOR, 'div[class = "MatchRowTime_time__2Fkd2 MatchRowTime_isLive__2qWag"]').text
這個想法是擁有一個包含主隊名稱、客隊名稱和實際比賽分鐘數的現場比賽資料框
有人能幫我嗎?
uj5u.com熱心網友回復:
AFAIK 在元素內定位元素的最清晰和最簡單的方法是使用以點開頭.
的XPathHome和AWAY團隊名稱以及匹配Time欄位可以通過以下定位器清楚地定位:
games1 = driver.find_elements(By.CSS_SELECTOR, 'div[class = "MatchRow_matchRowWrapper__1BtJ3"]')
data1 = []
for game1 in games1:
data1.append({
'Home':game1.find_element(By.XPATH, './/div[contains(@class,"MatchRow_home")]').text,
'Away':game1.find_element(By.XPATH, './/div[contains(@class,"MatchRow_away")]').text,
'Time':game1.find_element(By.XPATH, './/span[contains(@id,"match-row")]').text
uj5u.com熱心網友回復:
要使用來自網站的帶有主隊名稱和客隊名稱的Pandas創建DataFrame ,您需要為visibility_of_element_located()引入WebDriverWait ,并且可以使用以下定位器策略:
使用CSS_SELECTOR:
driver.get('https://www.livescore.com/en/') WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click() Home_team_name = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[id$='home-team-name']")))] Away_team_name = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[id$='away-team-name']")))] df = pd.DataFrame(data=list(zip(Home_team_name, Away_team_name)), columns=['Home Team Name', 'Home Team Name']) print(df)注意:您必須添加以下匯入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC控制臺輸出:
Home Team Name Home Team Name 0 Bayern Munich FC Salzburg 1 Liverpool Inter 2 FC Porto Lyon 3 Real Betis Eintracht Frankfurt
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/442021.html
