我正在嘗試從網站中提取資料。使用開發者工具時,我可以看到我感興趣的資料保存在多個區域中,都具有相同的類名(flyers_flyer-col__ZN-6Z)

我想遍歷這些專案中的每一個,并提取資訊,特別是 aria 標簽和目標 href。當我嘗試時,我似乎只能提取第一個專案......我不確定如何遍歷所有專案。
這是我嘗試過的代碼:
for flyers in soup.find_all("div",class_='flyers_flyer-col__ZN_6Z'):
links = flyers.find_all("a",href=True)
for flyer in flyers:
print(flyer['href'])
然而,這只給了我第一次發現 flyers_flyer-col__ZN-6Z 類的結果。我怎樣才能得到其余的?
uj5u.com熱心網友回復:
頁面是由 Javascript 根據腳本標簽中存在的資料動態創建的。這是使用請求獲取該資料的一種方法:
import requests
from bs4 import BeautifulSoup as bs
import json
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
url = 'https://www.reebee.com/flyers?categoryID=2'
r = requests.get(url)
soup = bs(r.text, 'html.parser')
json_obj = json.loads(soup.select_one('script[id="__NEXT_DATA__"]').text)
df = pd.json_normalize(json_obj['props']['pageProps']['flyerList'])
print(df)
終端結果:
flyerID numberOfPages dateValid dateExpired priority resetVersion statusID flyerTypeID flyerVersion cycleID cycleDescriptionEn cycleDescriptionFr languageID category asset store.storeName store.storeID store.asset
0 1485954 17 2022-10-20 2022-10-26 422 0 2 1 10 113242 Weekly Flyer Circulaire hebdomadaire 0 [{'categoryID': 1}, {'categoryID': 4}, {'categoryID': 5}] [{'type': 'flyerAsset', 'assetTypeID': 4, 'version': 2, 'url': 'https://d3179alu5b1vk5.cloudfront.net/reebee-flyer-assets/2diio0ujinwgsscoggk8oscgk/8d498e2ac387dda58d7577041341ead1_t<width>x<height>', 'contentType': [{'extension': '.webp', 'type': 'image/webp', 'metadata': [{'width': 81, 'height': 108}, {'width': 121, 'height': 161}, {'width': 145, 'height': 193}, {'width': 189, 'height': 252}, {'width': 209, 'height': 279}, {'width': 284, 'height': 379}, {'width': 291, 'height': 388}, {'width': 314, 'height': 419}, {'width': 388, 'height': 517}]}]}] The Home Depot 10028 [{'type': 'storeLogoAsset', 'assetTypeID': 7, 'version': 3, 'url': 'https://reebee-assets.azureedge.net/reebee-store-assets/asset/b2c1c3505fe0f9dd8dc2ea5431158bfa', 'contentType': [{'extension': '.webp', 'type': 'image/webp'}]}]
1 1487388 4 2022-10-25 2022-11-21 1805 0 2 1 7 113698 Transform Any Recipe NaN 0 [{'categoryID': 1}, {'categoryID': 2}] [{'type': 'flyerAsset', 'assetTypeID': 4, 'version': 1, 'url': 'https://d3179alu5b1vk5.cloudfront.net/reebee-flyer-assets/cytg9ubx108c80owo84sogs8c/055f92a74153ee5d259b4ca8c5764f03_t<width>x<height>', 'contentType': [{'extension': '.webp', 'type': 'image/webp', 'metadata': [{'width': 81, 'height': 108}, {'width': 121, 'height': 161}, {'width': 145, 'height': 193}, {'width': 189, 'height': 252}, {'width': 209, 'height': 279}, {'width': 284, 'height': 379}, {'width': 291, 'height': 388}, {'width': 314, 'height': 419}, {'width': 388, 'height': 517}]}]}] VH 13578 [{'type': 'storeLogoAsset', 'assetTypeID': 7, 'version': 1, 'url': 'https://d3179alu5b1vk5.cloudfront.net/reebee-store-assets/338cb8d957621a8e05f7a28307c646a4_sl<width>x<height>', 'contentType': [{'extension': '.webp', 'type': 'image/webp', 'metadata': [{'width': 102, 'height': 102}, {'width': 120, 'height': 120}, {'width': 150, 'height': 150}, {'width': 200, 'height': 200}]}]}]
2 1486725 15 2022-10-21 2022-10-27 1807 0 2 1 6 113458 Weekly Flyer Circulaire hebdomadaire 0 [{'categoryID': 1}, {'categoryID': 3}] [{'type': 'flyerAsset', 'assetTypeID': 4, 'version': 1, 'url': 'https://d3179alu5b1vk5.cloudfront.net/reebee-flyer-assets/5cx8mbrucvc480oo00kccgkow/27d7c7fcb22f2500cf95ab4585b8be29_t<width>x<height>', 'contentType': [{'extension': '.webp', 'type': 'image/webp', 'metadata': [{'width': 81, 'height': 83}, {'width': 121, 'height': 124}, {'width': 145, 'height': 148}, {'width': 189, 'height': 193}, {'width': 209, 'height': 214}, {'width': 284, 'height': 291}, {'width': 291, 'height': 298}, {'width': 314, 'height': 321}, {'width': 388, 'height': 397}]}]}] 2001 Audio Video 10219 [{'type': 'storeLogoAsset', 'assetTypeID': 7, 'version': 1, 'url': 'https://d3179alu5b1vk5.cloudfront.net/reebee-store-assets/4a39eed200ea70f5640b8fadd16955d3_sl<width>x<height>', 'contentType': [{'extension': '.webp', 'type': 'image/webp', 'metadata': [{'width': 102, 'height': 102}]}]}]
3 1483100 4 2022-10-03 2022-10-30 1808 0 2 1 9 112307 October Savings économies d'octobre 0 [{'categoryID': 1}, {'categoryID': 10}] [{'type': 'flyerAsset', 'assetTypeID': 4, 'version': 1, 'url': 'https://d3179alu5b1vk5.cloudfront.net/reebee-flyer-assets/3o3awyoez3uowkgkkccsg0cw8/d232f306202435180ee80ccc50cc7cc4_t<width>x<height>', 'contentType': [{'extension': '.webp', 'type': 'image/webp', 'metadata': [{'width': 81, 'height': 106}, {'width': 121, 'height': 159}, {'width': 145, 'height': 190}, {'width': 189, 'height': 248}, {'width': 209, 'height': 274}, {'width': 284, 'height': 372}, {'width': 291, 'height': 381}, {'width': 314, 'height': 411}, {'width': 388, 'height': 508}]}]}] PetSmart 13189 [{'type': 'storeLogoAsset', 'assetTypeID': 7, 'version': 1, 'url': 'https://reebee-assets.azureedge.net/reebee-store-assets/asset/1e1465c11faff57018aefe7a610d12a2', 'contentType': [{'extension': '.webp', 'type': 'image/webp'}]}]
4 1486315 83 2022-10-25 2022-11-06 1817 0 2 1 9 113361 Two-Week Flyer Circulaire de deux semaines 0 [{'categoryID': 1}, {'categoryID': 3}, {'categoryID': 4}, {'categoryID': 5}] [{'type': 'flyerAsset', 'assetTypeID': 4, 'version': 1, 'url': 'https://d3179alu5b1vk5.cloudfront.net/reebee-flyer-assets/eeqpblbrj3cogo8cooc048w48/71091c68f1258d46c676b9305ea48ee9_t<width>x<height>', 'contentType': [{'extension': '.webp', 'type': 'image/webp', 'metadata': [{'width': 81, 'height': 108}, {'width': 121, 'height': 161}, {'width': 145, 'height': 193}, {'width': 189, 'height': 251}, {'width': 209, 'height': 278}, {'width': 284, 'height': 377}, {'width': 291, 'height': 386}, {'width': 314, 'height': 417}, {'width': 388, 'height': 515}]}]}] Princess Auto 10056 [{'type': 'storeLogoAsset', 'assetTypeID': 7, 'version': 2, 'url': 'https://d3179alu5b1vk5.cloudfront.net/reebee-store-assets/f8078d1e1744abbfd1bd61b4da4fb2c0_sl<width>x<height>', 'contentType': [{'extension': '.webp', 'type': 'image/webp', 'metadata': [{'width': 102, 'height': 102}]}]}]
您可以在該 json 物件中進一步深入 - 請參閱此處的 pandas 檔案:https ://pandas.pydata.org/docs/
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/519392.html
上一篇:嘗試從網站上的表格中抓取文本
