import requests
from bs4 import BeautifulSoup
URL = "https://www.hockey-reference.com/leagues/NHL_2021_games.html"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find(id="all_games")
table = soup.find('div', attrs = {'id':'div_games'})
print(table.prettify())
uj5u.com熱心網友回復:
選擇表格而不是 div 來列印表格:
table = soup.find('table', attrs = {'id':'games'})
print(table.prettify())
或用于pandas.read_html()獲取表格并轉換為資料框:
import pandas as pd
pd.read_html('https://www.hockey-reference.com/leagues/NHL_2021_games.html', attrs={'id':'games'})[0].iloc[:,:5]
輸出:
| 日期 | 游客 | G | 家 | G.1 |
|---|---|---|---|---|
| 2021-01-13 | 圣路易斯藍調 | 4 | 科羅拉多雪崩 | 1 |
| 2021-01-13 | 溫哥華加人隊 | 5 | 埃德蒙頓油人隊 | 3 |
| 2021-01-13 | 匹茲堡企鵝隊 | 3 | 費城傳單 | 6 |
| 2021-01-13 | 芝加哥黑鷹隊 | 1 | 坦帕灣閃電 | 5 |
| 2021-01-13 | 蒙特利爾加拿大人隊 | 4 | 多倫多楓葉隊 | 5 |
| ... | ... | ... | ... | ... |
uj5u.com熱心網友回復:
table = soup.find('div', attrs = {'id':'div_games'})
trs = table.find_all('tr')
gamestats = []
for tr in trs:
gamestat = {}
gamestat['home_team_name'] = tr.find('td', attrs = {'data-stat' : 'home_team_name'})
gamestat['visit_team_name'] = tr.find('td', attrs = {'data-stat' : 'visit_team_name'})
gamestats.append(gamestat)
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/407892.html
標籤:
