如何使用美麗的湯查找功能提取html元素-有解無憂

我正在嘗試使用美麗的湯來拉出與下面的HTML代碼相對應的表格

<table class="sortable stats_table now_sortable" id="team_pitching" data-cols-to-freeze=",2">
    <caption>Team Pitching</caption>

來自https://www.baseball-reference.com/register/team.cgi?id=17cdc2d2。這是我試圖從中提取的站點布局和 HTML 代碼的螢屏截圖。

我正在使用代碼

url = 'https://www.baseball-reference.com/register/team.cgi?id=17cdc2d2'
res = requests.get(url)
soup1 = BS(res.content, 'html.parser')
table1  = soup1.find('table',{'id':'team_pitching'})
table1

我似乎無法弄清楚如何讓這個作業。上表可以用以下行提取

table1  = soup1.find('table',{'id':'team_batting'})

我認為類似的代碼應該適用于下面的代碼。此外，有沒有辦法使用表類“sortable stats_table now_sortable”而不是 id 來提取它？

uj5u.com熱心網友回復：

問題是，如果您正常打開頁面，它會顯示所有表格，但是如果您使用開發人員工具加載頁面，則只會顯示第一個表格。因此，當您執行請求時，左表不會包含在您獲得的 HTML 中。在按下“顯示團隊投球”按鈕之前，您要查找的表格不會顯示，為此您可以使用 Selenium 并獲得完整的 HTML 回應。

uj5u.com熱心網友回復：

那是因為您要查找的表 -即<table>withid="team_pitching"作為注釋存在于湯中。你可以自己檢查一下。

你需要

從湯中提取該評論
將其轉換為湯物件
從湯物件中提取表資料。

這是執行上述步驟的完整代碼。

from bs4 import BeautifulSoup, Comment
import requests

url = 'https://www.baseball-reference.com/register/team.cgi?id=17cdc2d2'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')

main_div = soup.find('div', {'id': 'all_team_pitching'})

# Extracting the comment from the above selected <div>
for comments in main_div.find_all(text=lambda x: isinstance(x, Comment)):
    temp = comments.extract()

# Converting the above extracted comment to a soup object
s = BeautifulSoup(temp, 'lxml')
trs = s.find('table', {'id': 'team_pitching'}).find_all('tr')

# Printing the first five entries of the table
for tr in trs[1:5]:
    print(list(tr.stripped_strings))

表中的前 5 個條目

['1', 'Tyler Ahearn', '21', '1', '0', '1.000', '1.93', '6', '0', '0', '1', '9.1', '8', '5', '2', '0', '4', '14', '0', '0', '0', '42', '1.286', '7.7', '0.0', '3.9', '13.5', '3.50']
['2', 'Jack Anderson', '20', '2', '0', '1.000', '0.79', '4', '1', '0', '0', '11.1', '6', '4', '1', '0', '3', '11', '1', '0', '0', '45', '0.794', '4.8', '0.0', '2.4', '8.7', '3.67']
['3', 'Shane Drohan', '*', '21', '0', '1', '.000', '4.08', '4', '4', '0', '0', '17.2', '15', '12', '8', '0', '11', '27', '1', '0', '2', '82', '1.472', '7.6', '0.0', '5.6', '13.8', '2.45']
['4', 'Conor Grady', '21', '2', '0', '1.000', '3.00', '4', '4', '0', '0', '15.0', '10', '5', '5', '3', '8', '15', '1', '0', '2', '68', '1.200', '6.0', '1.8', '4.8', '9.0', '1.88']

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/341145.html

標籤：Python 网页抓取美汤

上一篇：使用匯出的模塊作為聯合型別

下一篇：我只想列印來自GoogleNews的1個故事