我正在嘗試抓取https://findamortgagebroker.com/網站。
當我使用諸如“https://findamortgagebroker.com/?search=San Diego&page=2”之類的搜索 url 時,我沒有得到使用開發工具進行檢查時看到的標簽。
我想抓取“class”等于“clickable-tile-contact”的“a”元素。
def get_soup(url):
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
time.sleep(10)
html_page = urlopen(req).read()
time.sleep(10)
soup = BeautifulSoup(html_page, 'html.parser')
return soup
url="https://findamortgagebroker.com/?search=San Diego&page=2"
soup=get_soup(url)
links=soup.find_all('a', attrs={'class':'clickable-tile-contact'})
uj5u.com熱心網友回復:
實際上,所需資料是通過 AJAX 請求從外部源加載的,API作為普通 HTML 樹作為 post 方法。因此,要獲得正確的資料,您必須改為應用 API url。
完整的作業代碼為例:
import requests
from bs4 import BeautifulSoup
api_url ='https://findamortgagebroker.com/home/SearchContacts/'
headers= {
"content-type":"application/x-www-form-urlencoded",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"}
body = "searchModel[SearchText]=San Diego&searchModel[PageNumber]=2&searchModel[Radius]=50&searchModel[ResultsPerPage]=20&searchModel[CaptchaToken]=03AEkXODDG8q9JqC--gCpxJK_Kevp506iB5o5Z7ilzY3Ge6GbYQaoX9jcOJqEyC6TG159L5KSvPoE43UlBxGMYW2jlNcnc0ING0sFeQO2RZIOui0YnNAaByRIVrjaluwaNi7WCE2FykjJNI0B5FNLB7nJjnr9N7YEeUkY13km0wRN3vfyqPh-bVdpahCir00GzE-pQyXU_o84bY1dCWRNQten7O_cnmdcA0ucEPxFeO3WIbMkUkUqqMC5vpAUiz_VttmYMyRETidTuaI6rHE2_AjGbUr6Z61vXFr-dXAC63alA15gGu8ypGRljtHS2wmfNSSySrtegnFxD3txZZ4d2KDk4ugBXLfh3jNUHM_KcKF6Rkp0WOHx7-D-4CEfMf-mC9zJ6FnVqJx3FTZiOrwcelQ0dW1OxdHuHlCVPPQlzIzcFMfsTJOsCLj3JNZTEgkQ6Eicl6dkVV-F-CRPd4fQZ2D_u3dDmrIaCIQJJ4LlQuSYXhLt-6QMcnFXceygadkKGqeiGQZcdUeagF6c8zz9OUg5g2ppXkCu-WsH08e-ei7sRHspA3Rdwh6sylcr8fqFlxDNmEXTI4CH1nRgLvJMuXr6KdcY3AWNhwA&searchModel[IsVendorRequest]=false&searchModel[VendorIdentifier]=0&searchModel[CaptchaV2]=false"
res = requests.post(api_url,data=body,headers=headers)
#print(res)
soup = BeautifulSoup(res.text,'lxml')
data =[]
for item in soup.select('.clickable-tile-contact'):
data.append({
'href':item.get('href'),
})
print(data)
輸出:
[{'href': 'https://findamortgagebroker.com/Profile\\AndresCamacho26826'}, {'href': 'https://findamortgagebroker.com/Profile\\DavidStein65836'}, {'href': 'https://findamortgagebroker.com/Profile\\DanielRamirez28222'}, {'href': 'https://findamortgagebroker.com/Profile\\DavidHolland56665'}, {'href': 'https://findamortgagebroker.com/Profile\\EvbeniiMalenko57387'}, {'href': 'https://findamortgagebroker.com/Profile\\AmirNurani66326'}, {'href': 'https://findamortgagebroker.com/Profile\\MarialuisaSarrizLira37868'}, {'href': 'https://findamortgagebroker.com/Profile\\DejaCorreia53368'}, {'href': 'https://findamortgagebroker.com/Profile\\JulioRugama72662'}, {'href': 'https://findamortgagebroker.com/Profile\\MarthaMunoz26537'}, {'href': 'https://findamortgagebroker.com/Profile\\CarlosMunoz55258'}, {'href': 'https://findamortgagebroker.com/Profile\\AndreaCutuk35775'}, {'href': 'https://findamortgagebroker.com/Profile\\LauraPardo64458'}, {'href': 'https://findamortgagebroker.com/Profile\\KatiePike37454'}, {'href': 'https://findamortgagebroker.com/Profile\\JustinGuthrie27854'}, {'href': 'https://findamortgagebroker.com/Profile\\GinoSalvaggio54863'}, {'href': 'https://findamortgagebroker.com/Profile\\AnnaValencia55287'}, {'href': 'https://findamortgagebroker.com/Profile\\ArtinMousakhan27554'}, {'href': 'https://findamortgagebroker.com/Profile\\GloriaPereira45832'}, {'href': 'https://findamortgagebroker.com/Profile\\NickKinnard38652'}]
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/530073.html
標籤:网页抓取美丽的汤
上一篇:如何從網路抓取中清除重復資料?
下一篇:點擊時抓取
