當訪問這個鏈接https://www.dickssportinggoods.com/f/tents-accessories?pageNumber=2時,我需要在頁面真正加載之前等待一段時間。這樣做有可能嗎? 我的代碼:
from requests_html importHTMLSession
from bs4 import BeautifulSoup
from lxml import etree
s = HTMLSession()
回應 = s.get(
'https://www.dickssportinggoods.com/f/tents-accessories?pageNumber=2')
response.html.render()
soup = BeautifulSoup(response.content, "html.parser")
dom = etree.HTML(str(soup))
item = dom.xpath('/a[@class="rs_product_description d-block"]/text()') [0]
print( item)
uj5u.com熱心網友回復:
看起來你要找的資料可以用HTTP GET來獲取
該呼叫將回傳一個JSON,你可以直接使用該JSON,而不需要使用刮削代碼。
將URL復制/粘貼到瀏覽器中 --> 查看資料。
你可以在URL中指定頁碼:
searchVO={"selectedCategory"/span>:"12301_1809051"/span>,"selectedStore"/span>: "0","selectedSort":1,"selectedFilters"。 {},"storeId":15108, "pageNumber": 2,"pageSize":48, "totalCount": 112,"searchTypes": ["PINNING"],"isFamilyPage":true,"appliedSeoFilters": false,"snbAudience":"","zipcode":"}。
下面的作業代碼
import requests
import pprint
page_num = 2
url = f'https://prod-catalog-product-api.dickssportinggoods.com/v2/search? searchVO={"selectedCategory":"12301_1809051","selectedStore":"0","selectedSort":1,"selectedFilters":{},"storeId":15108,"pageNumber":2,%2{page_num}pageSize":48,"totalCount":112,"searchTypes":["PINNING"],"isFamilyPage":true,"appliedSeoFilters":false,"snbAudience":"","zipcode":""}'
r = requests.get(url)
if r.status_code == 200:
pprint.pprint(r.json())
uj5u.com熱心網友回復:
你可以在無頭模式下誘導Selenium。
Selenium有能力用顯式等待找到單元元素。
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
options.add_argument('-window-size=1920,1080')
options.add_argument("-headless")
driver = webdriver.Chrome(executable_path = driver_path, options = options)
driver.get("URL here")
wait = WebDriverWait(driver, 20)
wait.until(EC.visibility_of_element_located((By.XPATH, "/a[@class='rs_product_description d-block']"))
PS:你必須從這里下載 chromedriver
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/320263.html
標籤:
