我有一組網頁網址的品牌編號。我將網頁 url 轉換為 f 字串,并在應有的位置應用品牌編號。每個頁面都有一個唯一的 ID 來加載下一個頁面。我正在嘗試提取下一頁,同時匹配 Id 所屬的品牌編號。
這是一些示例代碼:
import requests
import pandas as pd
from bs4 import BeautifulSoup
brands = [989,1344,474,1237,886,1,328,2188]
testid = {}
for b in brands:
url = f'https://webapi.depop.com/api/v2/search/products/?brands={b}&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance'
payload={}
headers = {}
response = requests.request("GET", url, headers=headers, data=payload)
test= pd.read_json(StringIO(response.text), lines=True)
for m in test['meta'].items():
if m[1]['hasMore'] == True:
testid[str(b)]= [m[1]['cursor']]
else:
continue
for br in testid.keys():
while True:
html = f'https://webapi.depop.com/api/v2/search/products/?brands={br}&cursor={testid[str(br)][-1]}&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance'
r = requests.request("GET",html, headers=headers, data=payload)
read_id = pd.read_json(StringIO(r.text), lines=True)
for m in read_id['meta'].items():
try:
testid[str(br)].append(m[1]['cursor'])
except:
continue
這是它產生的輸出:
{'989': ['MnwyNHwxNjQwMDMwODcw']}
但是,它會替換品牌編號中最初的值,只保留最后一個收集的值。它應該留下一個串列并產生如下內容:
{'989': ['MnwyNHwxNjQwMDI4Mzk1', ...],
'1344': ['MnwyNHwxNjQwMDI4Mzk2', ...],
'474': ['MnwyNHwxNjQwMDI4Mzk3', ...],
'1237': ['MnwyNHwxNjQwMDI4Mzk3', ...],
'886': ['MnwyNHwxNjQwMDI4Mzk4', ...],
'1': ['MnwyNHwxNjQwMDI4Mzk4', ...],
'328': ['MnwyNHwxNjQwMDI4Mzk5', ...],
其中三個點...表示從具有該品牌編號的頁面收集的附加 ID 值。我怎樣才能得到這樣的輸出?
uj5u.com熱心網友回復:
將testid串列設定為 a 后collections.defaultdict(list),其余部分以一種相當直接的方式出現..
注意:我只會獲取任何產品的前 3 個游標,但您可以隨心所欲地執行所有操作。
import collections
import requests
brands = [989,1344,474,1237,886,1,328,2188]
testid = collections.defaultdict(list)
for b in brands:
headers = {}
payload={}
url = f"https://webapi.depop.com/api/v2/search/products/?brands={b}&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance"
response = requests.request("GET", url, headers=headers, data=payload)
data = response.json()
i = 0 # short circuit
while data.get("meta", {}).get("hasMore") and i < 3:
cursor = data.get("meta", {}).get("cursor")
testid[str(b)].append(cursor)
response = requests.request("GET", f"{url}&cursor={cursor}", headers=headers, data=payload)
data = response.json()
i = 1
for key, value in testid.items():
print(key, value)
這給了我們:
989 ['MnwyNHwxNjQwMDMzMjM0']
1344 ['MnwyNHwxNjQwMDMzMjM1', 'M3w0OHwxNjQwMDMzMjM1', 'NHw3MnwxNjQwMDMzMjM1']
474 ['MnwyNHwxNjQwMDMzMjM3', 'M3w0OHwxNjQwMDMzMjM3', 'NHw3MnwxNjQwMDMzMjM3']
1237 ['MnwyNHwxNjQwMDMzMjM5', 'M3w0OHwxNjQwMDMzMjM5', 'NHw3MnwxNjQwMDMzMjM5']
886 ['MnwyNHwxNjQwMDMzMjQz', 'M3w0OHwxNjQwMDMzMjQz', 'NHw3MnwxNjQwMDMzMjQz']
1 ['MnwyNHwxNjQwMDMzMjQ4', 'M3w0OHwxNjQwMDMzMjQ4', 'NHw3MnwxNjQwMDMzMjQ4']
328 ['MnwyNHwxNjQwMDMzMjUz', 'M3w0OHwxNjQwMDMzMjUz', 'NHw3MnwxNjQwMDMzMjUz']
等一下......發生了什么事:
data.get("meta", {}).get("hasMore")
很好的問題,我應該之前解釋過。
因此,存在data.meta未定義的機會,如果這是真的,則以下操作將失敗;
data["meta"].get("hasMore")
一樣
data.get("meta").get("hasMore")
所以我們做了什么:
data.get("meta", {}).get("hasMore")
是使用的第二個引數get()來提供默認值。在這種情況下,它只是一個空的,dict但這足以讓我們安全地將后續內容鏈接起來.get("hasMore")。
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/388003.html
