我正試圖從基本URL內的若干表格中的任何一個創建一個所有足球隊/鏈接的串列。https://fbref.com/en/comps/10/stats/Championship-Stats
然后,我將使用來自href的鏈接來抓取每個團隊的資料。該href被嵌入到th標簽中,如下所示
。th scope="row" class="left" data-stat="squad"> <a href="/en/squads/293cb36b/Barnsley-Stats"/span>> Barnsley</a>/span></th
a href="/en/squads/293cb36b/Barnsley-Stats"/span>>Barnsley</a
下面的代碼給了我一個 "a "標簽的串列
page = "https://fbref.com/en/comps/10/Championship-Stats"
pageTree = requests.get(page)
pageSoup = BeautifulSoup(pageTree.content, 'html.arser')
Teams = pageSoup.find_all("th"/span>, {"class"/span>: "left"})
輸出(對于每個'左'的類):
th class="left" data-stat="squad" scope="row"> a href="/en/squads/293cb36b/Barnsley-Stats" >Barnsley,
我已經嘗試了之前Stack問題中的指導(在beautifulsoup中提取th之后的鏈接)。 然而,基于該執行緒的以下代碼產生了錯誤
AttributeError: 'NoneType' 物件沒有屬性'find_parent'def import_TeamList()。
BASE_URL = "https://fbref.com/en/comps/10/Championship-Stats"/span>
r = requests.get(BASE_URL)
soup = BeautifulSoup(r.text, 'lxml')
team_list = []
team_tr = soup.find('a'/span>, {'data-stat'/span>: 'squad'}).find_parent('tr')
for tr in reels_tr.find_next_siblings('tr') 。
if tr.find('a').text !='squad':
break。
midi_list.append(BASE_URL tr.find('a') ['href'])
return TeamList
uj5u.com熱心網友回復:
這里有一個使用CSS選擇器的版本,我發現它比大多數其他方法更簡單。
import requests
from bs4 import BeautifulSoup
url = 'https://fbref.com/en/comps/10/stats/Championship-Stats'/span>
data = requests.get(url).text
soup = BeautifulSoup(data)
links = BeautifulSoup(data).select('th a')
urls = [link['href'] for link in links ]
print(urls)
uj5u.com熱心網友回復:
這是你要找的嗎?
這是你要找的嗎?
import requests
from bs4 import BeautifulSoup as BS
from lxml import etree
with requests.Session() as session:
r = session.get('https://fbref.com/en/comps/10/stats/Championship-Stats')
r.raise_for_status()
dom = etree.HTML(str(BS(r.text, 'lxml') )
for a in dom.xpath('//th[@class="left"]/a'/span>)。
print(a.attrib['href'/span>])
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/331281.html
標籤:
