我正在嘗試進入網頁并獲取每一行的 href/鏈接。
目前,代碼只列印空白。
預期的輸出是列印網頁中每一行的 href/link。
import requests
from bs4 import BeautifulSoup
url = 'https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/286/program-guide/search?q=&pageNumber=1&size=20'
baseurl='https://ash.confex.com/ash/2021/webprogram/'
res = requests.get(url)
soup = BeautifulSoup(res.content,'html.parser')
productlist = soup.find_all('div',class_='session-card')
for b in productlist:
links = b["href"]
print(links)
uj5u.com熱心網友回復:
發生什么了?
首先仔細看看你的湯,你不會找到你搜索的資訊,因為你會被屏蔽。
您選擇的元素find_all('div',class_='session-card')也沒有直接屬性href。
怎么修?
向您的請求添加一些標頭:
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36'}
res = requests.get(url, headers=headers)
<a>在您的迭代中另外選擇以選擇鏈接并獲得href:
b.a["href"]
例子
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36'}
url = 'https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/286/program-guide/search?q=&pageNumber=1&size=20'
baseurl='https://ash.confex.com/ash/2021/webprogram/'
res = requests.get(url, headers=headers)
soup = BeautifulSoup(res.content,'html.parser')
for b in soup.find_all('div',class_='session-card'):
links = b.a["href"]
print(links)
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/377638.html
