我正在嘗試抓取下一頁上商品的 href 值,但前提是商品顯示為有貨:https ://www.waitrosecellar.com/whisky-shop/view-all-whiskies/whisky-by -品牌/麥卡倫
使用以下代碼,我成功地抓取了 href,但是 out_of_stock 標志似乎不起作用,并且仍然回傳列印串列中缺貨的專案。我的代碼:
import ssl
import requests
import sys
import time
import smtplib
from email.message import EmailMessage
import hashlib
from urllib.request import urlopen
from datetime import datetime
import json
import random
import requests
from itertools import cycle
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from urllib3.exceptions import InsecureRequestWarning
from requests_html import HTMLSession
session = HTMLSession()
user_agent_list = [
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:77.0) Gecko/20100101 Firefox/77.0',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36',
]
for i in range(1,4):
#Pick a random user agent
user_agent = random.choice(user_agent_list)
#Set the headers
headers = {'User-Agent': user_agent}
url = 'https://www.waitrosecellar.com/whisky-shop/view-all-whiskies/whisky-by-brand/macallan'
response = requests.get(url,headers=headers)
soup = BeautifulSoup(response.text,features="html.parser")
test = []
for product in soup.find_all('div', class_="productName"):
out_of_stock=False
for span in product.parent.find_all('span', ):
if "Out of stock" in span.text:
out_of_stock = True
break
if not out_of_stock:
test.append(product.a['href'])
print(test)
請就如何使 out_of_stock 標志正常作業提出建議,以便只列印有庫存的商品。謝謝!
uj5u.com熱心網友回復:
這是區分缺貨/可用產品的一種方法:
import requests
from bs4 import BeautifulSoup as bs
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
url = 'https://www.waitrosecellar.com/whisky-shop/view-all-whiskies/whisky-by-brand/macallan'
r = requests.get(url, headers=headers)
soup = bs(r.text, 'html.parser')
cards = soup.select('div[]')
for c in cards:
product = c.select_one('div[ ] a').text.strip()
product_url = c.select_one('div[ ] a').get('href')
availability = 'Product Available' if c.select_one('div[]').get('style') == 'display:none;' else 'Out of Stock'
if availability == 'Product Available':
print(product, product_url, availability)
終端結果:
Macallan 12 Year Old Sherry Oak https://www.waitrosecellar.com/macallan-12-year-old-sherry-oak-717201 Product Available
當然,您也可以獲得有關產品的其他資料點。請在此處查看 BeautifulSoup 檔案:https ://beautiful-soup-4.readthedocs.io/en/latest/ 此外,Requests-Html 似乎沒有維護,上次發布是將近 4 年前?發布時間:2019 年 2 月 17 日
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/535483.html
標籤:Python网页抓取美汤
