這是我的第一個問題，所以我希望我在正確的地方問這個問題并且這個問題是適當的。

我正在使用 python 和 selenium 從這個網站收集資料：https ://www.sqdc.ca 我能夠抓取主頁并收集主要產品類別的串列。我還可以進入每個類別的頁面并收集每個產品的資訊（例如：https ://www.sqdc.ca/en-CA/dried-cannabis?fn1=InStock&fv1=in store|online&origin =dropdown&c1=products&c2=dried-cannabis&clickedon=dried-cannabis )。我還設法獲取所有產品的 URL，以嘗試收集每個產品的更多詳細資訊。

我一直堅持這最后一步已經有一段時間了。當我嘗試進入每個產品的頁面以獲取更多詳細資訊時（例如這里：https ://www.sqdc.ca/en-CA/p-apples-cream/671148904118-P/671148904118 ），我無法找到商店串列中顯示可用性和庫存的部分，它會立即在我的瀏覽器中加載

當我在瀏覽器中查看頁面源代碼時，這是我所追求的部分：

<div id="storesList" class="store-inventory">
<div data-templateid="StoreInventoryList">
<p class="lead text-center">Unavailable</p>
</div>

不知道為什么它不可用。理想情況下，我想獲得該串列，然后單擊“查看更多商店”，直到它們全部加載。

我試圖等待，但沒有奏效，無論如何，當我登陸頁面時，似乎該串列已經加載。

有什么想法嗎？我知道由 javascript 生成的串列，因為當我在瀏覽器中檢查頁面時，我看到了一個名為 row-js-equalize 的類。

代碼：

#設定驅動程式和選項

options = webdriver.ChromeOptions()
options.add_argument('start-maximized')
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('headless')
options.add_argument('no-sandbox')
options.add_argument("window-size=1200x600")
driver = webdriver.Chrome("/home/amr/Downloads/chromedriver", options=options)
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})
print(driver.execute_script("return navigator.userAgent;"))

獲取頁面并決議

driver.get(product['https://www.sqdc.ca/en-CA/p-cbd-decarb/628634303078-P/628634303078'])
content = driver.page_source
soup = BeautifulSoup(content, "lxml")

如果你去網址，底部的商店和庫存部分就是我所追求的。我在決議的 xml 中找不到它

uj5u.com熱心網友回復：

您不需要使用 Selenium 來獲取庫存，在您的瀏覽器中，您可以找到對庫存端點的后端 api 呼叫： https://www.sqdc.ca/api/olivestoreinventory/getstoresinventory

要找到它，請打開瀏覽器的開發人員工具 - 網路選項卡 - 獲取/Xhr 并重繪頁面，您想要的所有詳細資訊都從各種后端 api 呼叫中加載。我們可以像這樣重新創建它們：

import requests

headers =   {
    'accept-language': 'en-CA', #import to keep this header for some reason
    'x-requested-with':'XMLHttpRequest'#import to keep this header
    }

url = 'https://www.sqdc.ca/api/olivestoreinventory/getstoresinventory'
payload = {"Sku":"671148904118","Page":1,"Pagesize":1000} #Pagesize is basically number of stores, get all stores with 1000, SKU comes from url

resp = requests.post(url,headers=headers,json=payload).json()
print(len(resp['Stores']))

inventory = {x['Name']:x['InventoryStatus']['Quantity'] for x in resp['Stores']} #store_name : inventory count
print(inventory)

我已經決議了 json 回應并創建了一個庫存字典，其中包含此 SKU 的 82 家商店的商店名稱和庫存水平，只要您在有效負載中發送 SKU 編號，您就可以為任何產品重新創建它

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/421984.html

標籤：

上一篇：如何從導航中抓取大影像？

下一篇：在Python中使用selenium運行回圈時出現代碼問題

硒/蟒蛇網路

獲取頁面并決議