網頁抓取不會在html中回傳-有解無憂

我正在嘗試抓取https://www.houzz.com.au/professionals/home-builders/turrell-building-pty-ltd-pfvwau-pf~1099128087。在檢查中，它顯示 HTML 內容。但是當我用 beautifulsoup 抓取它時，它會回傳一些其他的東西和一些 html，我對此知之甚少。我得到的一小部分如下。

</div><style data-styled="true" data-styled-version="5.2.1">.fzynIk.fzynIk{box-sizing:border-box;margin:0;overflow:hidden;}/*!sc*/
.eiQuKK.eiQuKK{box-sizing:border-box;margin:0;margin-bottom:4px;}/*!sc*/
.chJVzi.chJVzi{box-sizing:border-box;margin:0;margin-left:8px;}/*!sc*/
.kCIqph.kCIqph{box-sizing:border-box;margin:0;padding-top:32px;padding-bottom:32px;border-top:1px solid;border-color:#E6E6E6;}/*!sc*/
.dIRCmF.dIRCmF{box-sizing:border-box;margin:0;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-box-pack:justify;-webkit-justify-content:space-between;-ms-flex-pack:justify;justify-content:space-between;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;margin-bottom:16px;}/*!sc*/
.kmAORk.kmAORk{box-sizing:border-box;margin:0;margin-bottom:24px;}/*!sc*/
.bPERLb.bPERLb{box-sizing:border-box;margin:0;margin-bottom:-8px;}/*!sc*/

我該怎么辦？這不是用 beautfulsoup 可以實作的嗎？

uj5u.com熱心網友回復：

開發者工具在實時瀏覽器 DOM 上運行，在檢查頁面源時您將看到的不是原始 HTML，而是在應用一些瀏覽器清理和執行 JavaScript 代碼后修改后的 HTML。

Requests 不執行 JavaScript，因此內容可能會略有不同，但您可以刮擦 - 只需深入了解您的湯。

示例（專案名稱）

from bs4 import BeautifulSoup
import requests


url_news = " https://www.houzz.com.au/professionals/home-builders/turrell-building-pty-ltd-pfvwau-pf~1099128087"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36'}

response = requests.get(url_news, headers=headers)
soup = BeautifulSoup(response.content, "html.parser")

[title.text for title in soup.select('#projects h3')]

輸出

[‘大修&主翼’、‘意大利鄉村’私人住宅’、‘鄉村經典’、‘住宅度假村’、‘度假村風格擴建、石材和木材’、‘老北路莊園’]

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/310975.html

標籤：Python 网页抓取美汤

上一篇：Pandas按多列分組并從非分組列中選擇非空的最后一個值

下一篇：在Python中使用Replace方法得到不同的結果