我是 python 的新手,我正在嘗試構建一個網路抓取演算法。
我正在嘗試抓取“href”網址:

我的代碼:
URL = 'https://www.rotowire.com/basketball/team.php?team=UTA'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
service = Service(ChromeDriverManager().install())
for link in soup.find_all({"aria-colindex" : "3"}):
print(link.get('href'))
driver = webdriver.Chrome(service = service)
但這沒有任何回報。我也嘗試過 {'style' : "width: 96px; left: 190px; top: 0px;"} 代替 {"aria-colindex" : "3"},但這也回傳了 'None'。不知道我做錯了什么,所以任何幫助將不勝感激:)
uj5u.com熱心網友回復:
資料是從api動態加載的。直接從 api 檢索鏈接更容易。這是一個pandas實作:
import pandas as pd
from bs4 import BeautifulSoup
df = pd.read_json('https://www.rotowire.com/basketball/tables/team-schedule.php?team=UTA')
df['url'] = df['score'].apply(lambda x: BeautifulSoup(x).find('a')['href'])
df.to_csv('output.csv') #export to csv
uj5u.com熱心網友回復:
根據你的問題。這是作業解決方案。
代碼:
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome('chromedriver.exe')
url = "https://www.rotowire.com/basketball/team.php?team=UTA"
driver.get(url)
time.sleep(8)
soup = BeautifulSoup(driver.page_source, 'html.parser')
urls = soup.select('div.webix_column.align-c div a')
for url in urls:
print('href_url:' url['href'])
輸出:
href_url:/basketball/box-score.php?gid=2347768
href_url:/basketball/box-score.php?gid=2347767
href_url:/basketball/box-score.php?gid=2347765
href_url:/basketball/box-score.php?gid=2347764
href_url:/basketball/box-score.php?gid=2347762
href_url:/basketball/box-score.php?gid=2347760
href_url:/basketball/box-score.php?gid=2346563
href_url:/basketball/box-score.php?gid=2346562
href_url:/basketball/box-score.php?gid=2346561
href_url:/basketball/box-score.php?gid=2346420
href_url:/basketball/box-score.php?gid=2346295
href_url:/basketball/box-score.php?gid=2314246
href_url:/basketball/box-score.php?gid=2314315
href_url:/basketball/box-score.php?gid=2314159
href_url:/basketball/box-score.php?gid=2314155
href_url:/basketball/box-score.php?gid=2314153
href_url:/basketball/box-score.php?gid=2314144
href_url:/basketball/box-score.php?gid=2314220
href_url:/basketball/box-score.php?gid=2314333
href_url:/basketball/box-score.php?gid=2314142
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/325047.html
上一篇:通過“flaskrun”運行Flask與從編輯器運行(Windows10)
下一篇:用于回圈的熊貓樣式無法正常作業
