我正在嘗試使用 Python 和 BeautifulSoup 從下面的頁面獲取產品圖片。該影像在javascript中。我正在使用 lxml。我創建了我的代碼的簡化版本,只關注影像。
我想要的圖片網址是https://lapa.co.za/pub/media/catalog/product/cache/image/700x700/e9c3970ab036de70892d86c6d221abfe/l/e/learn_to_read_l3_b05_tippie_fish_cover.jpg
import json
from bs4 import BeautifulSoup
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148'
}
testlink = 'https://lapa.co.za/kinder-en-tienerboeke/leer-my-lees-vlak-1-grootboek-9-tippie-en-die-vis'
r = requests.get(testlink, headers=headers)
soup = BeautifulSoup(r.content, 'lxml')
title = soup.find('h1', class_='page-title').text.strip()
images = soup.find('div', class_='product-img-column')
# html_data = requests.get(testlink).text
# data = json.loads(re.search(r'window.INITIAL_REDUX_STATE=(\{.*?\});', html_data))
print(images)
uj5u.com熱心網友回復:
json 在<script>標簽中。只需要把它拉出來。
import json
from bs4 import BeautifulSoup
import requests
import re
headers = {
'User-Agent': 'Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148'
}
testlink = 'https://lapa.co.za/kinder-en-tienerboeke/leer-my-lees-vlak-1-grootboek-9-tippie-en-die-vis'
r = requests.get(testlink, headers=headers)
soup = BeautifulSoup(r.content, 'lxml')
title = soup.find('h1', class_='page-title').text.strip()
images = soup.find('div', class_='product-img-column')
script = images.find('script', {'type':'text/x-magento-init'})
jsonStr = re.search(r'<script type=\"text/x-magento-init\">(.*)</script>', str(script), re.IGNORECASE | re.DOTALL).group(1)
data = json.loads(jsonStr)
image_data = data['[data-gallery-role=gallery-placeholder]']['mage/gallery/gallery']['data'][0]
image_url = image_data['full']
# OR
#image_url = image_data['img']
print(image_url)
輸出:
print(image_url)
https://lapa.co.za/pub/media/catalog/product/cache/image/e9c3970ab036de70892d86c6d221abfe/9/7/9780799377347_1.jpg
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/344487.html
