如何提取特定的“資料統計”值？（Python）-有解無憂

因此，到目前為止的代碼從https://www.basketball-reference.com拉出一個頁面，并使用 data-stat 類（???）獲取 tr_body 中的任何資料。

我需要一種方法來提取資料統計的特定值，例如對于data-stat="pos" = PG的https://www.basketball-reference.com/players/l/lowryky01.html輸出。

from bs4 import BeautifulSoup
import requests

first = ()
first_slice = ()
last = ()


def askname():
    global first
    first = input(str("First Name of Player?"))
    global last
    last = input(str("Last Name of Player?"))
    print("Confirmed, loading up "   first   " "   last)
# asks user for player name

askname()


first_slice_result = (first[:2])
last_slice_result = (last[:5])
print(first_slice_result)
print(last_slice_result)
# slices player's name so it can match the format bref uses
first_slice_resultA = str(first_slice_result)
last_slice_resultA = str(last_slice_result)

first_last_slice = last_slice_resultA   first_slice_resultA

lower = first_last_slice.lower()   "01"

start_letter = (last[:1])
lower_letter = (start_letter.lower())
# grabs the letter bref uses for organization

print(lower)
source = requests.get('https://www.basketball-reference.com/players/'   lower_letter   '/'   lower   '.html').text

soup = BeautifulSoup(source, 'lxml')
tbody = soup.find('tbody')
pergame = tbody.find(class_="full_table")
classrite = pergame.find(class_="right")
tr_body = tbody.find_all('tr')
print(pergame)


# seperates data-stat, apparently you can use .get to get obscure classes
for trb in tr_body:
    print(trb.get('id'))

    th = trb.find('th')
    print(th.get_text())
    print(th.get('data-stat'))


    for td in trb.find_all('td'):
        print(td.get_text())
        print(td.get('data-stat'))

大約 4 個月前開始了這個專案，我無法記住如何拆分和提取特定的資料統計資訊。

uj5u.com熱心網友回復：

好吧，據我所知，你基本上已經做了你想做的事。

從這一點開始，只需將您提取的資訊組織到字典中，然后您就可以通過它們的鍵提取值。

for trb in tr_body:
    print(trb.get('id'))

    th = trb.find('th')
    print(th.get_text())
    print(th.get('data-stat'))
    
    row = {}
    for td in trb.find_all('td'):
        row[td.get('data-stat')] = td.get_text()
    
    print(row['pos'], row['team_id'], row['fg_pct'])

希望這可以幫助。

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/480112.html

標籤：Python 网页抓取

上一篇：For回圈到CSV導致Python中的行不均勻

下一篇：如何將影像從網路抓取中保存到檔案夾中？（Python）