- Python版本:3.8
- bs4 庫
我有以下 HTML,它代表了我抓取的大約 20 多條評論中的 2 條。由于篇幅原因,我沒有在此處包括其余部分,但您可以想象這些塊不斷重復。
我需要從每條評論中檢索“sml-rank-stars sml-str40 星”(如這里的第二行所示)。
<div class="review-rank">
<span class="sml-rank-stars sml-str40 star"></span>
<span class="score">
<span class="item">
口味:3.5
</span>
<span class="item">
環境:4.0
</span>
<span class="item">
服務:3.5
</span>
<span class="item">人均:200元</span>
</span>
</div>
<div class="review-rank">
<span class="sml-rank-stars sml-str35 star"></span>
<span class="score">
<span class="item">
口味:3.0
</span>
<span class="item">
環境:4.5
</span>
<span class="item">
服務:3.0
</span>
</span>
</div>
這是我到目前為止所嘗試的:
for review in review_items.find_all('div', class_='main-review'):
review_rank = review.find('div', class_='review-rank')
star_rank = []
for review in review_rank.find_all('span')[:1]:
star_rank.append(review.get('class'))
print(star_rank)
我得到結果輸出:
[['sml-rank-stars', 'sml-str5', 'star']]
然后我可以使用此代碼僅獲取號碼:
star_rank[0][1][7:]
輸出:
'5'
問題是我只收到一條評論,我的串列中存盤的每條評論都需要這條線。
我想要的輸出是這樣的,或者我可以迭代以獲得每個評論的星數:
[['sml-rank-stars', 'sml-str40', 'star'],
['sml-rank-stars', 'sml-str35', 'star'],
['sml-rank-stars', 'sml-str50', 'star'],
['sml-rank-stars', 'sml-str40', 'star'],
['sml-rank-stars', 'sml-str40', 'star'],
['sml-rank-stars', 'sml-str50', 'star'],
['sml-rank-stars', 'sml-str50', 'star'],
['sml-rank-stars', 'sml-str45', 'star'],
['sml-rank-stars', 'sml-str10', 'star'],
['sml-rank-stars', 'sml-str35', 'star'],
['sml-rank-stars', 'sml-str45', 'star'],
['sml-rank-stars', 'sml-str40', 'star'],
['sml-rank-stars', 'sml-str45', 'star'],
['sml-rank-stars', 'sml-str10', 'star'],
['sml-rank-stars', 'sml-str5', 'star']]
我已經想出了如何使用以下代碼列印出這樣的結果,但我需要將它保存到一個串列或其他我可以迭代的東西中。
for review in review_items.find_all('div', class_='main-review'):
review_rank = review.find('div', class_='review-rank')
for review in review_rank.find_all('span')[:1]:
print(review.get('class'))
輸出:
['sml-rank-stars', 'sml-str40', 'star']
['sml-rank-stars', 'sml-str35', 'star']
['sml-rank-stars', 'sml-str50', 'star']
['sml-rank-stars', 'sml-str40', 'star']
['sml-rank-stars', 'sml-str40', 'star']
['sml-rank-stars', 'sml-str50', 'star']
['sml-rank-stars', 'sml-str50', 'star']
['sml-rank-stars', 'sml-str45', 'star']
['sml-rank-stars', 'sml-str10', 'star']
['sml-rank-stars', 'sml-str35', 'star']
['sml-rank-stars', 'sml-str45', 'star']
['sml-rank-stars', 'sml-str40', 'star']
['sml-rank-stars', 'sml-str45', 'star']
['sml-rank-stars', 'sml-str10', 'star']
['sml-rank-stars', 'sml-str5', 'star']
uj5u.com熱心網友回復:
遍歷所有.review-rank選擇所有這些 - 要獲得排名僅使用串列理解:
star_rank = []
for r in soup.select('.review-rank'):
star_rank.append([s.replace('sml-str','') for s in r.span['class'] if 'sml-str' in s][0])
或者像你的例子一樣,不知道上面的一般結構review_items,如果只有一個或多個:
star_rank = []
for review in review_items.find_all('div', class_='main-review'):
for review in review.find_all('div', class_='review-rank'):
star_rank.append([s.replace('sml-str','') for s in review.span['class'] if 'sml-str' in s][0])
輸出
['40', '35']
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/409298.html
標籤:
上一篇:電報獲取訊息歷史
