<dl class="book__details-item">
<dt class="book__details-name">
Место издания:
</dt>
<dt class="book__details-value">
Москва
</dt>
</dl>
<dl class="book__details-item">
<dt class="book__details-name">
Издательство:
</dt>
<dt class="book__details-value">
<a href="/publishers/5558/" target="blank">Манн, Иванов и Фербер</a>
</dt>
</dl>
<dl class="book__details-item">
<dt class="book__details-name">
Год издания:
</dt>
<dt class="book__details-value">
2021
</dt>
</dl>
<dt class="book__details-name">
Год издания:
</dt>
<dt class="book__details-value">
2021
</dt>
你好。這里我有一個書店網站。我需要找出出版年份,但我無法做到,將書籍描述的每個元素都放在一個類似命名的塊下。
def get_html(url, params=None):
r = requests.get(url, headers = HEADERS, params = params)
return r
def get_content(html): # Here's a part, where it gets confusing
years = []
soup = BeautifulSoup(html, "html.parser")
items = soup.find("div", class_="book__details-left")
smalleritems = items.find("dl", class_="book__details-item")
smalleritems = smalleritems.find("dt", class_="book__details-value")
smalleritems = smalleritems.get_text()
print(smalleritems)
def parse(URL):
html = get_html(URL)
if html.status_code == 200:
midlinklist = get_content(html.text)
return midlinklist
else:
print("Error")
for URL in final_linklist:
print (str(URL))
print("Парсинг", page, "страниц из", len(final_linklist) - page)
page = page 1
midl = parse(str(URL))
for pubs in midl:
final_publist.append(pubs)
我的代碼還沒有完成,因為我不太明白下面的文本“2021”
<dt class="book__details-value">
2021
</dt>
uj5u.com熱心網友回復:
您可以book__details-value使用[-1]索引獲取類下的最后一個標簽:
soup = BeautifulSoup(html, "html.parser")
print(soup.find_all("dt", class_="book__details-value")[-1].get_text(strip=True))
輸出:
2021
uj5u.com熱心網友回復:
我會dt通過它的類和它包含的文本錨定到正確的前面ieГод издания:或 translate The year of publishing:。匹配后,使用相鄰的同級組合符 ( ) 移動到具有 class 的相鄰元素book__details-value:
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('<individual book url>')
soup = bs(r.content, 'lxml')
print(int(soup.select_one('.book__details-name:-soup-contains("Год издания:") .book__details-value').text.strip()))
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/367138.html
