主頁 > .NET開發 > Python請求網頁抓取API

Python請求網頁抓取API

2022-04-02 23:17:09 .NET開發

我在網路抓取和 API 方面有一些經驗,但是我無法在這個網站上搜索合適的 API 來執行此操作:

https://www.giga.com.vc/Bebida obs:/Bebida 只是一個類似于“/Drinks”的類別

問題是,我找到了幾個 API,但它們僅適用于一種產品,或者它們甚至適用于某些產品,但我似乎無法找到正確的規則來使用適當的類別或頁面對其進行分頁并遍歷類別產品以獲取價格, EANS 等

import requests
import pandas as pd
from bs4 import BeautifulSoup

例如:這可行,但格式很糟糕:

print(requests.get('https://www.giga.com.vc/padaria?initialMap=c&initialQuery=padaria&map=category-1&page=1').content)

要么

urlx = 'https://www.giga.com.vc/_v/segment/graphql/v1?workspace=master&maxAge=short&appsEtag=remove&domain=store&locale=pt-BR&operationName=Products&variables={}&extensions={"persistedQuery":{"version":1,"sha256Hash":"49a77e3e2082563773aff56ad9c0432d59302e86fd1baaad9ca0f4bca2630d46","sender":"[email protected]","provider":"[email protected]"},"variables":"eyJoaWRlVW5hdmFpbGFibGVJdGVtcyI6ZmFsc2UsInNrdXNGaWx0ZXIiOiJBTExfQVZBSUxBQkxFIiwiaW5zdGFsbG1lbnRDcml0ZXJpYSI6Ik1BWF9XSVRIT1VUX0lOVEVSRVNUIiwiY2F0ZWdvcnkiOiIiLCJjb2xsZWN0aW9uIjoiMTYvIiwic3BlY2lmaWNhdGlvbkZpbHRlcnMiOltdLCJvcmRlckJ5IjoiIiwiZnJvbSI6MCwidG8iOjExfQ=="}'
r = requests.get(urlx)
for x in r.json()['data']['products']:
    print(x)

這也有效:

url2 = 'https://www.giga.com.vc/_v/segment/graphql/v1?workspace=master&maxAge=short&appsEtag=remove&domain=store&locale=pt-BR&__bindingId=3f6e91e6-44f2-4fb0-a2d9-e238b53082e0&operationName=ProductRecommendations&variables={}&extensions={"persistedQuery":{"version":1,"sha256Hash":"e5782bd9e8bc64d337a7d7f96b9c280b462cdb0754d15b415192dac2755ad280","sender":"[email protected]","provider":"[email protected]"},"variables":"eyJpZGVudGlmaWVyIjp7ImZpZWxkIjoiaWQiLCJ2YWx1ZSI6IjE0NzUyMyJ9LCJ0eXBlIjoidmlldyJ9"}'

requests.get(url2).json()['data']['productRecommendations']

預期的輸出是這樣的:

r = requests.get(urlx)
for items in r.json()['data']['products']:
    prd_dict = {
        'product_id': items['productId'],
        'price': items['priceRange']['sellingPrice']['highPrice'],
        'product_name': items['productName'],
        'category_id': items['categoryId'],
        'ean': items['items'][0]['ean'],
        'box_qty': items['specificationGroups'][0]['specifications'][0]['values']
        }
    print(prd_dict)

原始輸出:

{'product_id': '141917', 'price': 20.54, 'product_name': 'Banana Nanica Kg', 'category_id': '433', 'ean': '4511', 'box_qty': ['0']}
{'product_id': '148077', 'price': 1.45, 'product_name': 'água de Coco Tradicional Quadrado 200Ml', 'category_id': '148', 'ean': '0751320333650', 'box_qty': ['27']}

uj5u.com熱心網友回復:

它發送解碼后variablesbase64

'{"hideUnavailableItems":false,"skusFilter":"ALL","simulationBehavior":"default","installmentCriteria":"MAX_WITHOUT_INTEREST","productOriginVtex":false,"map":"c","query":"bebida","orderBy":"OrderByScoreDESC","from":40,"to":59,"selectedFacets":[{"key":"c","value":"bebida"}],"operator":"and","fuzzy":"0","searchState":null,"facetsBehavior":"Static","categoryTreeBehavior":"default","withFacets":false}'

如果我決議 URL,將所有轉換為字典,替換為'from': 0 'to': 99然后我得到 100 個產品

但如果 U 使用大于 99 的值,它就不起作用。也許它需要對 URL 進行一些其他更改。

import base64
import urllib.parse
import urllib.request
import json

url1 = 'https://www.giga.com.vc/_v/segment/graphql/v1?workspace=master&maxAge=short&appsEtag=remove&domain=store&locale=pt-BR&__bindingId=3f6e91e6-44f2-4fb0-a2d9-e238b53082e0&operationName=productSearchV3&variables={}&extensions={"persistedQuery":{"version":1,"sha256Hash":"6869499be99f20964918e2fe0d1166fdf6c006b1766085db9e5a6bc7c4b957e5","sender":"[email protected]","provider":"[email protected]"},"variables":"eyJoaWRlVW5hdmFpbGFibGVJdGVtcyI6ZmFsc2UsInNrdXNGaWx0ZXIiOiJBTEwiLCJzaW11bGF0aW9uQmVoYXZpb3IiOiJkZWZhdWx0IiwiaW5zdGFsbG1lbnRDcml0ZXJpYSI6Ik1BWF9XSVRIT1VUX0lOVEVSRVNUIiwicHJvZHVjdE9yaWdpblZ0ZXgiOmZhbHNlLCJtYXAiOiJjIiwicXVlcnkiOiJiZWJpZGEiLCJvcmRlckJ5IjoiT3JkZXJCeVNjb3JlREVTQyIsImZyb20iOjIwLCJ0byI6MzksInNlbGVjdGVkRmFjZXRzIjpbeyJrZXkiOiJjIiwidmFsdWUiOiJiZWJpZGEifV0sIm9wZXJhdG9yIjoiYW5kIiwiZnV6enkiOiIwIiwic2VhcmNoU3RhdGUiOm51bGwsImZhY2V0c0JlaGF2aW9yIjoiU3RhdGljIiwiY2F0ZWdvcnlUcmVlQmVoYXZpb3IiOiJkZWZhdWx0Iiwid2l0aEZhY2V0cyI6ZmFsc2V9"}'
url2 = 'https://www.giga.com.vc/_v/segment/graphql/v1?workspace=master&maxAge=short&appsEtag=remove&domain=store&locale=pt-BR&__bindingId=3f6e91e6-44f2-4fb0-a2d9-e238b53082e0&operationName=productSearchV3&variables={}&extensions={"persistedQuery":{"version":1,"sha256Hash":"6869499be99f20964918e2fe0d1166fdf6c006b1766085db9e5a6bc7c4b957e5","sender":"[email protected]","provider":"[email protected]"},"variables":"eyJoaWRlVW5hdmFpbGFibGVJdGVtcyI6ZmFsc2UsInNrdXNGaWx0ZXIiOiJBTEwiLCJzaW11bGF0aW9uQmVoYXZpb3IiOiJkZWZhdWx0IiwiaW5zdGFsbG1lbnRDcml0ZXJpYSI6Ik1BWF9XSVRIT1VUX0lOVEVSRVNUIiwicHJvZHVjdE9yaWdpblZ0ZXgiOmZhbHNlLCJtYXAiOiJjIiwicXVlcnkiOiJiZWJpZGEiLCJvcmRlckJ5IjoiT3JkZXJCeVNjb3JlREVTQyIsImZyb20iOjQwLCJ0byI6NTksInNlbGVjdGVkRmFjZXRzIjpbeyJrZXkiOiJjIiwidmFsdWUiOiJiZWJpZGEifV0sIm9wZXJhdG9yIjoiYW5kIiwiZnV6enkiOiIwIiwic2VhcmNoU3RhdGUiOm51bGwsImZhY2V0c0JlaGF2aW9yIjoiU3RhdGljIiwiY2F0ZWdvcnlUcmVlQmVoYXZpb3IiOiJkZWZhdWx0Iiwid2l0aEZhY2V0cyI6ZmFsc2V9"}'

#print('--- url ---')
#print(url)

parts = urllib.parse.urlparse(url1)
#print(parts)

query = urllib.parse.parse_qs(parts.query)
#print(query)

data = json.loads(query['extensions'][0])
variables = data['variables']
#print(variables)

q = base64.b64decode(variables.encode()).decode()
q = json.loads(q)

print('--- replace values ---')

print(q)

q['from'] = 0
q['to'] = 99

print(q)

print('---')

q = json.dumps(q)
variables = base64.b64encode(q.encode()).decode()
#print(variables)

data['variables'] = variables
query['extensions'][0] = json.dumps(data)
#print(query)

parts = parts._replace(query=urllib.parse.urlencode(query, doseq=True))
#print(parts)

url = urllib.parse.urlunparse(parts)
#print('--- url ---')
#print(url)


req = urllib.request.urlopen(url)
data = json.loads(req.read())
for number, item in enumerate(data['data']['productSearch']['products'], 1):
    print(number, '|', item['productName'])

結果:

1 | água de Coco Tradicional Quadrado 200Ml
2 | Leite Longa Vida Integral com Tampa Italac 1L
3 | Leite Longa Vida Quatá  Integral 1L
4 | água Mineral sem Gás Minalba 1,5L
5 | água Mineral Sem Gás Minalba 510Ml
6 | Refrigerante Coca-Cola200Ml
7 | Leite Longa Vida Integral Shefa Garrafa 1L
8 | Refrigerante Coca-Cola sem A?úcar 1L
9 | Cerveja Heineken Lata 350Ml
10 | Leite Integral Longa Vida com Tampa Ninho 1L
11 | Leite Longa Vida Semidesnatado Com Tampa Italac 1L
12 | Cerveja Heineken Long Neck 330Ml
13 | Cerveja Amstel Lata 269Ml
14 | água Mineral Minalba Com Gás 510Ml
15 | água Mineral Pureza Vital Sem Gás Nestlé Pet 510Ml
16 | água Mineral Sem Gás Bonafont 500Ml
17 | Refrigerante Coca-Cola sem A?úcar 200ml
18 | água de Coco Kero Coco 1L
19 | Refrigerante Coca-Cola 2,5L
20 | Leite Longa Vida Desnatado com Tampa Italac 1L
21 | Refrigerante Coca-Cola Lata 350Ml
22 | Leite Longa Vida Integral Tirol 1L
23 | água Mineral com Gás Pureza Vital 510ML
24 | Suco de Uva Integral Tinto Aurora 1,5L
25 | Achocolatado Toddynho 200Ml
26 | Refrigerante Coca-Cola sem A?úcar Lata 220ml
27 | Energético Red Bull Energy Drink 250Ml
28 | água Mineral Sem Gás Pureza Vital Nestlé 1,5 L
29 | Suco Natural One Laranja 2L
30 | Refrigerante Coca-Cola 2L
31 | Cerveja Skol Lata 350Ml
32 | Refrigerante Coca-Cola 1L
33 | Suco De Ma?? Yakult 200Ml
34 | Cerveja Império Puro Malte Lata 269Ml
35 | Refrigerante Guaraná Antarctica 2L
36 | Refrigerante Coca-Cola Lata 220Ml
37 | Leite Em Pó Integral Italac Sachê 400G
38 | água Coco Puro Coco 200Ml
39 | água Mineral Com Gás Crystal Pet 1,5 L
40 | Achocolatado sabor Chocolate Italakinho  200Ml
41 | Cerveja Duplo Malte Brahma Lata 350Ml
42 | água Mineral Indaiá sem Gás 500Ml
43 | Refresco em Pó Sabor Laranja Tang 25G
44 | água De Coco Puro Coco 1L
45 | Cerveja Budweiser Lata 269Ml
46 | Cerveja Skol Lata 269Ml
47 | Refresco em Pó Sabor Uva Tang 25G
48 | Refrigerante Guaraná Antarctica Lata 350Ml
49 | Cerveja Eisenbahn Pilsen Lata 350Ml
50 | Cerveja Itaipava Lata 350Ml
51 | Cerveja Itaipava Lata 269Ml
52 | Refrigerante Coca-Cola 600Ml
53 | Refrigerante Guaraná Antarctica 1,5L
54 | Cerveja Stella Artois Lata 269Ml
55 | Whisky Escocês Johnnie Walker Red Label 1L
56 | Refrigerante Coca-Cola sem A?úcar 2L
57 | água Mineral sem Gás Crystal Pet 1,5 L
58 | Cerveja Amstel Lata 350Ml
59 | Cerveja Corona Extra Long Neck 330Ml
60 | Cerveja Stella Artois Long Neck 330Ml
61 | água Mineral Sem Gás Bonafont 1,5 L
62 | Cerveja Puro Malte Petra Lata 350Ml
63 | água de Coco Kero Coco 200Ml
64 | Cerveja Heineken Garrafa 600Ml
65 | Refrigerante de Laranja Sukita 2L
66 | Chopp De Vinho Draft 600Ml
67 | Refrigerante De Lim?o H2Oh! 500Ml
68 | Suco Natural de Uva e Maca One Ambiente 2L
69 | Refrigerante Dolly Guaraná 2L
70 | Energético Energy Monster Lata 473Ml
71 | Refresco em Pó Sabor Lim?o Tang 25g
72 | Suco De Laranja Integral Prat's 4 Ls
73 | Energético Red Bull Tropical Energy Drink  250Ml
74 | Refrigerante Limoneto H2Oh! Pet 500Ml
75 | água T?nica Antarctica Zero 350Ml
76 | água Mineral Sem Gás Minalba 10 Ls
77 | Vodka Red Smirnoff 998Ml
78 | Suco De Laranja Natural Xand? Garrafa 900Ml
79 | Energético Red Bull Melancia Energy Drink 250Ml
80 | Bebida Láctea de Proteína Zero Lactose sabor Chocolate YoPro 15G
81 | água Ver?o Sense Lindoya 510ml
82 | Vodka Nacional Smirnoff Ice Red 269ml
83 | Whisky Escocês White Horse 8 Anos 1L
84 | Refrigerante Coca-Cola sem A?úcar 2,5L
85 | Refresco em Pó Sabor Maracujá Tang 25g
86 | Cerveja Império Puro Malte Lata 350Ml
87 | Vodka Ice Smirnoff 275Ml
88 | Cerveja Eisenbahn Pilsen Long Neck 355Ml
89 | Guaraná Com A?aí Natural Guaraviton Pet 500Ml
90 | Cerveja Budweiser Long Neck 330Ml
91 | água Mineral Com Gás Pet Crystal 500Ml
92 | água T?nica Antarctica Lata 350Ml
93 | Refrigerante Sabor Guaraná Mini Dolly Pet 350Ml
94 | água T?nica Schweppes lata 350ml
95 | Cacha?a 51 965Ml
96 | Cerveja Skol Lata 550Ml
97 | Refresco em Pó Sabor Abacaxi Tang 25g
98 | Cerveja Puro Malte Petra Lata 269Ml
99 | Cacha?a Velho Barreiro 910Ml
100 | Refrigerante Fanta Laranja 2L

轉載請註明出處,本文鏈接:https://www.uj5u.com/net/455147.html

標籤:Python 网页抓取 美丽的汤

上一篇:處理不同格式的Python請求API

下一篇:無法在c#中使用html敏捷包xpath從網頁獲取內部文本

標籤雲
其他(157675) Python(38076) JavaScript(25376) Java(17977) C(15215) 區塊鏈(8255) C#(7972) AI(7469) 爪哇(7425) MySQL(7132) html(6777) 基礎類(6313) sql(6102) 熊猫(6058) PHP(5869) 数组(5741) R(5409) Linux(5327) 反应(5209) 腳本語言(PerlPython)(5129) 非技術區(4971) Android(4554) 数据框(4311) css(4259) 节点.js(4032) C語言(3288) json(3245) 列表(3129) 扑(3119) C++語言(3117) 安卓(2998) 打字稿(2995) VBA(2789) Java相關(2746) 疑難問題(2699) 细绳(2522) 單片機工控(2479) iOS(2429) ASP.NET(2402) MongoDB(2323) 麻木的(2285) 正则表达式(2254) 字典(2211) 循环(2198) 迅速(2185) 擅长(2169) 镖(2155) 功能(1967) .NET技术(1958) Web開發(1951) python-3.x(1918) HtmlCss(1915) 弹簧靴(1913) C++(1909) xml(1889) PostgreSQL(1872) .NETCore(1853) 谷歌表格(1846) Unity3D(1843) for循环(1842)

熱門瀏覽
  • WebAPI簡介

    Web體系結構: 有三個核心:資源(resource),URL(統一資源識別符號)和表示 他們的關系是這樣的:一個資源由一個URL進行標識,HTTP客戶端使用URL定位資源,表示是從資源回傳資料,媒體型別是資源回傳的資料格式。 接下來我們說下HTTP. HTTP協議的系統是一種無狀態的方式,使用請求/ ......

    uj5u.com 2020-09-09 22:07:47 more
  • asp.net core 3.1 入口:Program.cs中的Main函式

    本文分析Program.cs 中Main()函式中代碼的運行順序分析asp.net core程式的啟動,重點不是剖析原始碼,而是理清程式開始時執行的順序。到呼叫了哪些實體,哪些法方。asp.net core 3.1 的程式入口在專案Program.cs檔案里,如下。ususing System; us ......

    uj5u.com 2020-09-09 22:07:49 more
  • asp.net網站作為websocket服務端的應用該如何寫

    最近被websocket的一個問題困擾了很久,有一個需求是在web網站中搭建websocket服務。客戶端通過網頁與服務器建立連接,然后服務器根據ip給客戶端網頁發送資訊。 其實,這個需求并不難,只是剛開始對websocket的內容不太了解。上網搜索了一下,有通過asp.net core 實作的、有 ......

    uj5u.com 2020-09-09 22:08:02 more
  • ASP.NET 開源匯入匯出庫Magicodes.IE Docker中使用

    Magicodes.IE在Docker中使用 更新歷史 2019.02.13 【Nuget】版本更新到2.0.2 【匯入】修復單列匯入的Bug,單元測驗“OneColumnImporter_Test”。問題見(https://github.com/dotnetcore/Magicodes.IE/is ......

    uj5u.com 2020-09-09 22:08:05 more
  • 在webform中使用ajax

    如果你用過Asp.net webform, 說明你也算是.NET 開發的老兵了。WEBform應該是2011 2013左右,當時還用visual studio 2005、 visual studio 2008。后來基本都用的是MVC。 如果是新開發的專案,估計沒人會用webform技術。但是有些舊版 ......

    uj5u.com 2020-09-09 22:08:50 more
  • iis添加asp.net網站,訪問提示:由于擴展配置問題而無法提供您請求的

    今天在iis服務器配置asp.net網站,遇到一個問題,記錄一下: 問題:由于擴展配置問題而無法提供您請求的頁面。如果該頁面是腳本,請添加處理程式。如果應下載檔案,請添加 MIME 映射。 WindowServer2012服務器,添加角色安裝完.netframework和iis之后,運行aspx頁面 ......

    uj5u.com 2020-09-09 22:10:00 more
  • WebAPI-處理架構

    帶著問題去思考,大家好! 問題1:HTTP請求和回傳相應的HTTP回應資訊之間發生了什么? 1:首先是最底層,托管層,位于WebAPI和底層HTTP堆疊之間 2:其次是 訊息處理程式管道層,這里比如日志和快取。OWIN的參考是將訊息處理程式管道的一些功能下移到堆疊下端的OWIN中間件了。 3:控制器處理 ......

    uj5u.com 2020-09-09 22:11:13 more
  • 微信門戶開發框架-使用指導說明書

    微信門戶應用管理系統,采用基于 MVC + Bootstrap + Ajax + Enterprise Library的技術路線,界面層采用Boostrap + Metronic組合的前端框架,資料訪問層支持Oracle、SQLServer、MySQL、PostgreSQL等資料庫。框架以MVC5,... ......

    uj5u.com 2020-09-09 22:15:18 more
  • WebAPI-HTTP編程模型

    帶著問題去思考,大家好!它是什么?它包含什么?它能干什么? 訊息 HTTP編程模型的核心就是訊息抽象,表示為:HttPRequestMessage,HttpResponseMessage.用于客戶端和服務端之間交換請求和回應訊息。 HttpMethod類包含了一組靜態屬性: private stat ......

    uj5u.com 2020-09-09 22:15:23 more
  • 部署WebApi隨筆

    一、跨域 NuGet參考Microsoft.AspNet.WebApi.Cors WebApiConfig.cs中配置: // Web API 配置和服務 config.EnableCors(new EnableCorsAttribute("*", "*", "*")); 二、清除默認回傳XML格式 ......

    uj5u.com 2020-09-09 22:15:48 more
最新发布
  • C#多執行緒學習(二) 如何操縱一個執行緒

    <a href="https://www.cnblogs.com/x-zhi/" target="_blank"><img width="48" height="48" class="pfs" src="https://pic.cnblogs.com/face/2943582/20220801082530.png" alt="" /></...

    uj5u.com 2023-04-19 09:17:20 more
  • C#多執行緒學習(二) 如何操縱一個執行緒

    C#多執行緒學習(二) 如何操縱一個執行緒 執行緒學習第一篇:C#多執行緒學習(一) 多執行緒的相關概念 下面我們就動手來創建一個執行緒,使用Thread類創建執行緒時,只需提供執行緒入口即可。(執行緒入口使程式知道該讓這個執行緒干什么事) 在C#中,執行緒入口是通過ThreadStart代理(delegate)來提供的 ......

    uj5u.com 2023-04-19 09:16:49 more
  • 記一次 .NET某醫療器械清洗系統 卡死分析

    <a href="https://www.cnblogs.com/huangxincheng/" target="_blank"><img width="48" height="48" class="pfs" src="https://pic.cnblogs.com/face/214741/20200614104537.png" alt="" /&g...

    uj5u.com 2023-04-18 08:39:04 more
  • 記一次 .NET某醫療器械清洗系統 卡死分析

    一:背景 1. 講故事 前段時間協助訓練營里的一位朋友分析了一個程式卡死的問題,回過頭來看這個案例比較經典,這篇稍微整理一下供后來者少踩坑吧。 二:WinDbg 分析 1. 為什么會卡死 因為是表單程式,理所當然就是看主執行緒此時正在做什么? 可以用 ~0s ; k 看一下便知。 0:000> k # ......

    uj5u.com 2023-04-18 08:33:10 more
  • SignalR, No Connection with that ID,IIS

    <a href="https://www.cnblogs.com/smartstar/" target="_blank"><img width="48" height="48" class="pfs" src="https://pic.cnblogs.com/face/u36196.jpg" alt="" /></a>...

    uj5u.com 2023-03-30 17:21:52 more
  • 一次對pool的誤用導致的.net頻繁gc的診斷分析

    <a href="https://www.cnblogs.com/dotnet-diagnostic/" target="_blank"><img width="48" height="48" class="pfs" src="https://pic.cnblogs.com/face/3115652/20230225090434.png" alt=""...

    uj5u.com 2023-03-28 10:15:33 more
  • 一次對pool的誤用導致的.net頻繁gc的診斷分析

    <a href="https://www.cnblogs.com/dotnet-diagnostic/" target="_blank"><img width="48" height="48" class="pfs" src="https://pic.cnblogs.com/face/3115652/20230225090434.png" alt=""...

    uj5u.com 2023-03-28 10:13:31 more
  • C#遍歷指定檔案夾中所有檔案的3種方法

    <a href="https://www.cnblogs.com/xbhp/" target="_blank"><img width="48" height="48" class="pfs" src="https://pic.cnblogs.com/face/957602/20230310105611.png" alt="" /></a&...

    uj5u.com 2023-03-27 14:46:55 more
  • C#/VB.NET:如何將PDF轉為PDF/A

    <a href="https://www.cnblogs.com/Carina-baby/" target="_blank"><img width="48" height="48" class="pfs" src="https://pic.cnblogs.com/face/2859233/20220427162558.png" alt="" />...

    uj5u.com 2023-03-27 14:46:35 more
  • 武裝你的WEBAPI-OData聚合查詢

    <a href="https://www.cnblogs.com/podolski/" target="_blank"><img width="48" height="48" class="pfs" src="https://pic.cnblogs.com/face/616093/20140323000327.png" alt="" /><...

    uj5u.com 2023-03-27 14:46:16 more