我有下面的HTML代碼:
<div class="col-12 col-sm-6 col-md-6 col-xl-4 product type-product post-31121 status-publish first instock product_cat-tyres has-post-thumbnail purchasable product-type-simple">
<div class="col-12 col-sm-6 col-md-6 col-xl-4 product type-product post-31301 status-publish instock product_cat-tyres has-post-thumbnail purchasable product-type-simple">
<div class="col-12 col-sm-6 col-md-6 col-xl-4 product type-product post-28416 status-publish last instock product_cat-tyres has-post-thumbnail purchasable product-type-simple">
我需要使用美麗的湯(31121 / 31301 / 28416 是 ids)提取類描述中呈現的每個產品的 ID ,我該怎么做?
uj5u.com熱心網友回復:
- 選擇所有以 post- 開頭的 div。
- 迭代該div的所有類名以過濾掉以post-開頭的類名。
- 將帖子 ID 添加到串列中。
import re
html_attr='''
<div >
<div >
<div >'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_attr, 'html.parser')
div_list = soup.find_all('div', {"class": re.compile("^post-")})
id_list = []
for div in div_list:
post_id = [name.split('-')[1] for name in div['class'] if name.startswith('post-')][0]
id_list.append(post_id)
print(id_list)
輸出
['31121', '31301', '28416']
uj5u.com熱心網友回復:
遍歷您的選擇提取class屬性,遍歷其類并選擇class以post-:
[c.split('-')[-1] for e in soup.select('div.type-product') for c in e['class'] if c.startswith('post-')]
要么
[c.split('-')[-1] for e in soup.select('div[class*="post-"]') for c in e['class'] if c.startswith('post-')]
例子
html = '''
<div >
<div >
<div >
'''
soup = BeautifulSoup(html)
[c.split('-')[-1] for e in soup.select('div.type-product') for c in e['class'] if c.startswith('post-')]
輸出
['31121', '31301', '28416']
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/448286.html
上一篇:在webscrape.Python/BeautifulSoup上調整湯的選擇
下一篇:從div內的網站中刮取一行文本
