
我想刮掉上面突出顯示的體育選單“文本”。 https://ekusports.com/
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0"}
url = "https://ekusports.com/"
reqs = requests.get(url, headers=headers)
soup = BeautifulSoup(reqs.text, 'html.parser')
website_text = soup.findAll(text = True)
uj5u.com熱心網友回復:
使用端點獲取選單資料。
這是如何做:
import requests
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:97.0) Gecko/20100101 Firefox/97.0",
"X-Requested-With": "XMLHttpRequest",
}
r = requests.get("https://ekusports.com/services/sportnames.ashx", headers=headers).json()
print("\n".join([s["sportInfo"]["sport_title"] for s in r["sports"]]))
輸出:
Baseball
Beach Volleyball
Bratzke Center
Cheerleading
Colonel Club
Cross Country
Dance Team
Development
EKUSports Builds
Football
General
Marketing/Promotions
Men's Basketball
Men's Cross Country
Men's Golf
Men's Tennis
Men's Track and Field
Name/Image/Likeness (NIL)
Soccer
Softball
Spirit Groups
Tickets
Track & Field
Volleyball
Women's Basketball
Women's Cross Country
Women's Golf
Women's Tennis
Women's Track and Field
uj5u.com熱心網友回復:
如果您使用的是漂亮的湯,它可能無法獲得下拉選單,因為它們是使用 JavaScript 呈現的。你需要一個刮刀,比如
硒
它可以自動化瀏覽器和 JavaScript 渲染,每次點擊時瀏覽器視窗都會更新,等等。Selenium 也非常易于使用。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/439584.html
