我正在撰寫一個 Paython 腳本來廢棄一個網站,當我嘗試獲取特定類時得到空輸出。
該塊是:
<div class="prdt Product"> == $0
::before
<!-- /cache: pl_class_46761{nULE0} -->
<div>
<h3 class= Title">...</div>
... etc, the rest of items
.py 是:
from bs4 import BeautifulSoup
import requests
baseurl = 'htps://www.list_of_brands.php'
headers = {
'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
}
r = requests.get('https://www.the_first_page_of_a_brand.html')
soup = BeautifulSoup(r.content, 'lxml')
productlist = soup.find_all('div', class_='prdt Product')
print(productlist)
我得到的只是 []
我找不到我的錯誤在哪里......也許與 == $0 相關?因為它似乎沒有正確選擇容器。
謝謝!
uj5u.com熱心網友回復:
我相信您的決議器可能是問題所在。當我使用https://www.maquillalia.com/apieu-m-406.html運行您的代碼時,我也沒有得到任何東西,直到我將決議器更改為html.parser- 這給了我一個標簽,productlist但有一條警告訊息; 如果我使用,警告就會消失html5lib
soup = BeautifulSoup(r.content, 'html5lib')
通過上述更改,它會列印
[<div ><!-- /cache: pl_class_46761{BPCwo} --><div><h3 ><a href="https://www.maquillalia.com/apieu-mascarilla-icing-sweet-bar-sheet-mask-sandia-p-46761.html">A'pieu - Mascarilla Icing Sweet Bar sheet Mask - Sandía</a></h3><div ><figure><a href="https://www.maquillalia.com/apieu-mascarilla-icing-sweet-bar-sheet-mask-sandia-p-46761.html"><img alt="以 == 0$ 結尾的類別的網路抓取斗爭" border="0" height="220" src="images/productos/thumbnails/a-pieu-mascarilla-icing-sweet-bar-sheet-mask-sandia-1-46761_thumb_220x220.jpg" title="A'pieu - Mascarilla Icing Sweet Bar sheet Mask - Sandía" width="220"/></a></figure></div><div >Mascarilla anatómica de algodón con vitaminas y oligoelementos que hidratan y recuperan la piel da?ada.
Con extracto de Sandía que hidrataa y cuida la piel.
<a href="https://www.maquillalia.com/apieu-mascarilla-icing-sweet-bar-sheet-mask-sandia-p-46761.html">Ver </a></div><div ><div data-price="1.90"><strong>1,90€</strong></div><div ><span data-rating="5.00"><span title="5"></span><span title="4"></span><span title="3"></span><span title="2"></span><span title="1"></span><span style="width: 100%"></span></span><span >(3)</span></div><div ><!-- cache: pl_boton_46761{BPCwo} --><a data-atribute="" data-href="https://www.maquillalia.com/apieu-mascarilla-icing-sweet-bar-sheet-mask-sandia-p-46761.html" data-id="46761" data-qty="6" href="javascript:void(0);" rel="nofollow" title="Comprar A'pieu - Mascarilla Icing Sweet Bar sheet Mask - Sandía">Comprar<span style="position: absolute;top: 0;left: 0;width:100%;height: 100%;"></span></a><!-- /cache: pl_boton_46761{BPCwo} --><a data-cid="0" data-list="" data-login="0" data-pid="46761" href="javascript:void(0);"></a><div ><span>IVA Incl.</span><span>Precio por 100 Gr: 9,05€</span></div></div></div></div></div>]
順便說一句,要找到任何div同時具有prdt和Product類,但不一定只是那些,您可以使用
soup.find_all('div', {'class':'prdt', 'class':'Product'})
或者最好
soup.select('div.prdt.Product')
然后也將包括帶有類prdt Product Agotado,等的 div。Product prdt
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/525884.html
