我正在嘗試撰寫一個簡單的Python 刮板,以保存TripAdvisor上特定地點的所有評論。
我用作示例的特定鏈接如下:
https://www.tripadvisor.com/Attraction_Review-g319796-d5988326-Reviews-or50-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html
這是我正在使用的代碼,應該列印相對html:
from bs4 import BeautifulSoup
import requests
url = "https://www.tripadvisor.com/Attraction_Review-g319796-d5988326-Reviews-or50-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
print(soup)
如果我在控制臺中運行此代碼,它會在requests.get(url)沒有任何輸出的情況下長時間掛起。使用另一個 url(例如url = "https://stackoverflow.com/")我立即得到正確顯示的 html。為什么 TripAdvisor 不起作用?我怎樣才能設法獲得它的html?
uj5u.com熱心網友回復:
添加一個user-agent應該在第一步解決您的問題,因為某些站點提供不同的內容或將其用于機器人/自動化檢測 - 在您的瀏覽器中打開 DEVTools 并從您的一個請求中復制用戶代理:
headers = {'User-Agent': 'Mozilla/5.0'}
r = requests.get(url,headers=headers)
例子
from bs4 import BeautifulSoup
import requests
url = "https://www.tripadvisor.com/Attraction_Review-g319796-d5988326-Reviews-or50-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html"
headers = {'User-Agent': 'Mozilla/5.0'}
r = requests.get(url,headers=headers)
data = r.text
soup = BeautifulSoup(data)
data = []
for e in soup.select('#tab-data-qa-reviews-0 [data-automation="reviewCard"]'):
data.append({
'rating':e.select_one('svg[aria-label]')['aria-label'],
'profilUrl':e.select_one('a[tabindex="0"]').get('href'),
'content':e.select_one('div:has(>a[tabindex="0"]) div div').text
})
data
輸出
[{'rating': '5.0 of 5 bubbles',
'profilUrl': '/ShowUserReviews-g319796-d5988326-r620396152-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html',
'content': "We were fortunate to get in without pre-booking.What a find. A UNESCO site in the middle of the countryside.The replication cave is so awesome and authentic, hard to believe it's not the real thing.The museum is beautifully curated, great for students, and anyone interested in archeology and the beginnings of human existence.Definitely worth visiting. We nearly missed out ??Read more"},
{'rating': '5.0 of 5 bubbles',
'profilUrl': '/ShowUserReviews-g319796-d5988326-r618358203-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html',
'content': 'Beautiful site with great replica’s of the original cave, excellent exposition, poor film as an introduction however!The most urgent issue: long waiting because you need a slot to enter. This could be done 1000% better and in every decent museum it is done better! Staff probably civil servants with no great desire to make you enjoy the visit. Building urgently needs a revamp, no exposure at all!Read more'},...]
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/459900.html
上一篇:嘗試刮表提供空輸出
