我想在 Tripadvisor 中洗掉一些酒店資料,但我無法從“酒店等級”中獲取資料。我怎樣才能做到這一點?
代碼:
import requests
from bs4 import BeautifulSoup
import re
import time
import datetime
url = 'https://www.tripadvisor.com/Hotel_Review-g60763-d93543-Reviews-The_Shelburne_Sonesta_New_York-New_York_City_New_York.html'
headers = {'User-Agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"}
site = requests.get(url, headers=headers)
soup = BeautifulSoup(site.content, 'html.parser')
name = soup.find('h1',{'id':'HEADING'}).text
address = soup.find('div',{'class':'gZwVG S4 H3 f u ERCyA'}).text
hotel_class = soup.find('div',{'class':'euDRl _R MC S4 _a H'}).text
no_reviews = soup.find('span',{'class':'biGQs _P pZUbB biKBZ KxBGd'}).text if soup.find('span',{'class':'biGQs _P pZUbB biKBZ KxBGd'}) else ""
ct = datetime.datetime.now()
dt_string = ct.strftime("%d/%m/%Y %H:%M:%S")
print(name, address, hotel_class, no_reviews, dt_string)
uj5u.com熱心網友回復:
不確定它是否是預期的輸出,但您可以從以下位置刮取屬性的aria-label值<svg>:
hotel_class = soup.find('div',{'class':'euDRl _R MC S4 _a H'}).svg.get('aria-label')
輸出:
4.0 of 5 bubbles
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/519405.html
上一篇:驗證網站的Bs4決議輸出
