我正在嘗試查看是否可以使用并且只能使用 Pandas 的 read_html 函式從以下網站抓取 HTML 表格:https ://www.baseball-reference.com/teams/ATL/2021.shtml
我可以使用 selenium/bs 來滿足我的需求,但我想看看我是否可以只用 pd.read_html 來抓取這個網站的表格。
目前,pd.read_html 回傳前兩個表,但無法訪問第二個表之后的表。
這是我嘗試訪問的表“id”的示例:“the40man”
還有我的代碼,它回傳“ValueError:未找到表”:
pd.read_html("https://www.baseball-reference.com/teams/ATL/2021.shtml", attrs = {'id': 'the40man'})
以下代碼回傳前兩個表,{'id': ['team_batting', 'team_pitching']},僅此而已:
pd.read_html("https://www.baseball-reference.com/teams/ATL/2021.shtml")
我是出于好奇而問這個問題,以防萬一我遺漏了什么。如果不是,這個問題很可能是由于 pd.read_html 的限制。
提前感謝您提供任何 input/pd.read_html 提示!
uj5u.com熱心網友回復:
reference.com 網站在 html 的評論中包含其中一些表格。要拉出這些表格,您需要先拉出評論。然后你可以遍歷這些來得到你想要的表:
import requests
from bs4 import BeautifulSoup, Comment
import pandas as pd
url = 'https://www.baseball-reference.com/teams/ATL/2021.shtml'
result = requests.get(url).text
data = BeautifulSoup(result, 'html.parser')
comments = data.find_all(string=lambda text: isinstance(text, Comment))
tables = []
for each in comments:
if 'table' in str(each):
try:
tables.append(pd.read_html(str(each), attrs = {'id': 'the40man'})[0])
break
except:
continue
輸出:
print(tables[0])
Rk Uni Name Unnamed: 3 ... Ht Wt DoB 1stYr
0 1 30 Kyle Wright us US ... 6' 4" 215 Oct 2, 1995 2015
1 2 0 William Woods us US ... 6' 3" 190 Dec 29, 1998 2018
2 3 51 Will Smith us US ... 6' 5" 255 Jul 10, 1989 2008
3 4 68 Tyler Matzek us US ... 6' 3" 230 Oct 19, 1990 2010
4 5 64 Tucker Davidson us US ... 6' 2" 215 Mar 25, 1996 2016
5 6 62 Touki Toussaint us US ... 6' 3" 215 Jun 20, 1996 2014
6 7 65 Spencer Strider us US ... 6' 0" 195 Oct 28, 1998 2018
7 8 15 Sean Newcomb us US ... 6' 5" 255 Jun 12, 1993 2012
8 9 40 Mike Soroka ca CA ... 6' 5" 225 Aug 4, 1997 2015
9 10 54 Max Fried us US ... 6' 4" 190 Jan 18, 1994 2012
10 11 77 Luke Jackson us US ... 6' 2" 210 Aug 24, 1991 2011
11 12 33 A.J. Minter us US ... 6' 0" 215 Sep 2, 1993 2013
12 13 0 Kirby Yates us US ... 5' 10" 205 Mar 25, 1987 2009
13 14 0 Jay Jackson us US ... 6' 1" 195 Oct 27, 1987 2008
14 15 71 Jacob Webb us US ... 6' 2" 210 Aug 15, 1993 2014
15 16 19 Huascar Ynoa do DO ... 6' 2" 220 May 28, 1998 2015
16 17 36 Ian Anderson us US ... 6' 3" 170 May 2, 1998 2016
17 18 0 Freddy Tarnok us US ... 6' 3" 185 Nov 24, 1998 2017
18 19 74 Dylan Lee us US ... 6' 3" 214 Aug 1, 1994 2015
19 20 0 Alan Rangel mx MX ... 6' 2" 170 Aug 21, 1997 2015
20 21 0 Brooks Wilson us US ... 6' 2" 205 Mar 15, 1996 2015
21 22 50 Charlie Morton us US ... 6' 5" 215 Nov 12, 1983 2002
22 23 14 Adam Duvall us US ... 6' 1" 215 Sep 4, 1988 2010
23 24 24 William Contreras ve VE ... 6' 0" 180 Dec 24, 1997 2015
24 25 27 Austin Riley us US ... 6' 3" 240 Apr 2, 1997 2015
25 26 16 Travis d'Arnaud us US ... 6' 2" 210 Feb 10, 1989 2007
26 27 0 Travis Demeritte us US ... 6' 0" 180 Sep 30, 1994 2013
27 28 0 Chadwick Tromp aw AW ... 5' 8" 221 Mar 21, 1995 2013
28 29 25 Cristian Pache do DO ... 6' 2" 215 Nov 19, 1998 2016
29 30 13 Ronald Acuna Jr. ve VE ... 6' 0" 205 Dec 18, 1997 2015
30 31 1 Ozzie Albies cw CW ... 5' 8" 165 Jan 7, 1997 2014
31 32 9 Orlando Arcia ve VE ... 6' 0" 187 Aug 4, 1994 2011
32 33 7 Dansby Swanson us US ... 6' 1" 190 Feb 11, 1994 2013
33 34 0 Drew Waters us US ... 6' 2" 185 Dec 30, 1998 2017
34 35 20 Marcell Ozuna do DO ... 6' 1" 225 Nov 12, 1990 2008
35 36 0 Manny Pina ve VE ... 6' 0" 222 Jun 5, 1987 2005
36 37 38 Guillermo Heredia cu CU ... 5' 10" 195 Jan 31, 1991 2009
37 38 66 Kyle Muller us US ... 6' 7" 250 Oct 7, 1997 2016
38 Rk Uni Name NaN ... Ht Wt DoB 1stYr
[39 rows x 14 columns]
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/407906.html
標籤:
上一篇:使用JS使用拆分組織陣列
