使用專案加載器scrapy獲取鍵內的值-有解無憂

我正在嘗試從網頁回應頁面中的鍵中提取一些值。不幸的是，當我這樣做時，它只回傳鍵，我似乎無法獲取值。因為每個鍵都是一個很長的串列，而且它們都有編號，所以我似乎無法弄清楚如何獲取所有鍵的值。

例如，這是我的作業代碼：

import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy.loader import ItemLoader
from scrapy.item import Field
from itemloaders.processors import TakeFirst

class DepopItem(scrapy.Item):
    brands = Field(output_processor=TakeFirst())

class DepopSpider(scrapy.Spider):
    name = 'depop'
    allowed_domains = ["depop.com"]
    start_urls = ['https://webapi.depop.com/api/v2/search/filters/aggregates/?brands=1596&itemsPerPage=24&country=gb&currency=GBP&sort=relevance']

    
    custom_settings = {
        'USER_AGENT': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36',
    }
    
    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(
                url=url, 
                callback=self.parse,
             )

    def parse(self, response):
        resp= response.json()['brands']
        for item in resp:
            loader = ItemLoader(DepopItem(), selector=item)
            loader.add_value('brands', item)
 
            yield loader.load_item()

這將回傳一個鍵串列：

{"brands": "1"}
{"brands": "2"}
{"brands": "3"}
{"brands": "4"}
{"brands": "5"}
{"brands": "7"}
{"brands": "9"}

相反，我想要與這些鍵對應的值：

{"brands": 946}
{"brands": 2376}
{"brands": 1286}
{"brands": 2774}
{"brands": 489}
{"brands": 11572}
{"brands": 1212}

uj5u.com熱心網友回復：

使用values()或resp[item].

例子：

import scrapy
from scrapy.loader import ItemLoader
from scrapy.item import Field
from itemloaders.processors import TakeFirst


class DepopItem(scrapy.Item):
    brands = Field(output_processor=TakeFirst())


class DepopSpider(scrapy.Spider):
    name = 'depop'
    allowed_domains = ["depop.com"]
    start_urls = ['https://webapi.depop.com/api/v2/search/filters/aggregates/?brands=1596&itemsPerPage=24&country=gb&currency=GBP&sort=relevance']

    custom_settings = {
        'USER_AGENT': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36',
    }

    def parse(self, response):
        resp = response.json()['brands']
        for item in resp.values():
            loader = ItemLoader(DepopItem(), selector=item)
            loader.add_value('brands', item['count'])
            yield loader.load_item()

輸出：

{'brands': 888}
{'brands': 1}
{'brands': 52}
{'brands': 138}
{'brands': 148}
...
...
...

uj5u.com熱心網友回復：

我不確定scrapy怎么樣，但你可以簡單地做：

import requests
import json
from itertools import starmap
from requests.models import Response
from typing import Dict, List


url = "https://webapi.depop.com/api/v2/search/filters/aggregates/?brands=1596&itemsPerPage=24&country=gb&currency=GBP&sort=relevance"
resp: Response = requests.get(url)
data: Dict = json.loads(resp.text).get("brands")
values: List[Dict] = list(starmap(lambda k,v: {"brands": v["count"]}, data.items()))

輸出：

[{'brands': 989},
 {'brands': 1838},
 {'brands': 2415},
 {'brands': 1344},
 ...]

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/388005.html

標籤：Python 网页抓取刮的

上一篇：在for回圈和while回圈中正確索引

下一篇：Scrapy從站點下載json檔案？