抓取英超聯賽表時出現多個錯誤-有解無憂

我正在學習網路抓取。

我以此作為參考成功地抓取了頂級 youtuber 排名。

我使用相同的邏輯來抓取PL 排名，但有兩個問題：

它只收集到第 5 位。
它只獲得結果的第一名
然后，得到屬性錯誤：

抓取英超聯賽表時出現多個錯誤

    from bs4 import BeautifulSoup
    import requests
    import csv


    url = 'https://www.premierleague.com/tables'
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'html.parser')
    
    standings = soup.find('div', attrs={'data-ui-tab': 'First Team'}).find_all('tr')[1:]
    print(standings)
    
    file = open("pl_standings.csv", 'w')
    writer = csv.writer(file)
    
    writer.writerow(['position', 'club_name', 'points'])
    
    for standing in standings:
        position = standing.find('span', attrs={'class': 'value'}).text.strip()
        club_name = standing.find('span', {'class': 'long'}).text
        points = standing.find('td', {'class': 'points'}).text
    
        print(position, club_name, points)
    
        writer.writerow([position, club_name, points])
    
    file.close()

uj5u.com熱心網友回復：

問題是html.parser沒有正確決議頁面（嘗試使用lxml決議器）。此外，每秒都會<tr>得到正確的結果：

import requests
from bs4 import BeautifulSoup


url = "https://www.premierleague.com/tables"
page = requests.get(url)
soup = BeautifulSoup(page.content, "lxml") # <-- use lxml

standings = soup.find("div", attrs={"data-ui-tab": "First Team"}).find_all(
    "tr"
)[1::2]  # <-- get every second <tr>

for standing in standings:
    position = standing.find("span", attrs={"class": "value"}).text.strip()
    club_name = standing.find("span", {"class": "long"}).text
    points = standing.find("td", {"class": "points"}).text
    print(position, club_name, points)

印刷：

1 Manchester City 77
2 Liverpool 76
3 Chelsea 62
4 Tottenham Hotspur 57
5 Arsenal 57
6 Manchester United 54
7 West Ham United 52
8 Wolverhampton Wanderers 49
9 Leicester City 41
10 Brighton and Hove Albion 40
11 Newcastle United 40
12 Brentford 39
13 Southampton 39
14 Crystal Palace 37
15 Aston Villa 36
16 Leeds United 33
17 Everton 29
18 Burnley 28
19 Watford 22
20 Norwich City 21

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/462909.html

標籤：Python 网页抓取美丽的汤

上一篇：將python輸出捕獲到變數

下一篇：使用SeleniumVBA從webtable中最后過濾的行中獲取資料