我想要做的是抓取多個頁面并在單個陣列中產生結果。
到目前為止我已經嘗試過:
import scrapy
class RealtorSpider(scrapy.Spider):
name = "realtor"
allowed_domains = ["realtor.com"]
start_urls = ["http://realtor.com/"]
headers = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:98.0) Gecko/20100101 Firefox/98.0",
"Accept": "text/html,application/xhtml xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"Upgrade-Insecure-Requests": "1",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
"Sec-GPC": "1",
"Connection": "keep-alive",
"If-None-Match": '"d9b9d-uhdwucnqmaT5gbxbobPzbm uEgs"',
"Cache-Control": "max-age=0",
"TE": "trailers",
}
def start_requests(self):
url = "https://www.realtor.com/realestateandhomes-search/Seattle_WA/show-newest-listings"
for page in range(1, 4):
next_page = url "/pg-" str(page)
yield scrapy.Request(
url=next_page, headers=self.headers, callback=self.parse, priority=1
)
def parse(self, response):
# extract data
for card in response.css("ul.property-list"):
item = {"price": card.css("span[data-label=pc-price]::text").getall()}
yield item
這給了我三個單獨的價格清單。
['$740,000', '$998,000', '$620,000', ......, '$719,000', '$2,975,000', '$1,099,000']
['$500,000', '$474,000', '$725,000', ......, '$895,000', '$619,500', '$1,199,000']
['$1,095,000', '$475,000', '$700,000', ........, '$950,000', '$995,000', '$639,950']
我正在尋找的是得到一個這樣的串列:
$740,000 - 1
$998,000 - 2
$620,000 - 3
$719,000 - 4
.
.
.
$995,000 - 143
$639,950 - 144
uj5u.com熱心網友回復:
我不確定究竟是什么導致了示例串列,但假設您呼叫了RealtorSpider實際導致獲得三個串列的函式之一。由于這些函式用于yield回傳值,因此您可能需要list在這些函式的輸出上呼叫以獲得串列而不是generator.
我建議您編輯您的realtor.py檔案,如下所示:
import scrapy
import json
class RealtorSpider(scrapy.Spider):
name = "realtor"
allowed_domains = ["realtor.com"]
start_urls = ["http://realtor.com/"]
prices = []
headers = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:98.0) Gecko/20100101 Firefox/98.0",
"Accept": "text/html,application/xhtml xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"Upgrade-Insecure-Requests": "1",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
"Sec-GPC": "1",
"Connection": "keep-alive",
"If-None-Match": '"d9b9d-uhdwucnqmaT5gbxbobPzbm uEgs"',
"Cache-Control": "max-age=0",
"TE": "trailers",
}
def start_requests(self):
url = "https://www.realtor.com/realestateandhomes-search/Seattle_WA/show-newest-listings"
for page in range(1, 4):
next_page = url "/pg-" str(page)
yield scrapy.Request(
url=next_page, headers=self.headers, callback=self.parse, priority=1
)
def parse(self, response):
# extract data
for card in response.css("ul.property-list"):
item = {"price": card.css("span[data-label=pc-price]::text").getall()}
self.prices.append(item["price"])
yield item
data = [x for y in self.prices for x in y]
with open("data.json", "w") as f:
f.write(json.dumps(data))
如果將檔案編輯到這個檔案中,scrapy crawl realtor在 shell 中運行后會生成一個名為data.json. 這個檔案正是你想要的。因此,您可以閱讀它:
import json
data = json.load(open("data.json"))
data
輸出
['$575,000',
'$399,950',
'$620,000',
'$1,150,000',
'$1,100,000',
'$880,000',
'$735,000',
'$337,000',
'$759,800',
'$330,000',
'$575,000',
'$740,000',
'$639,950',
'$950,000',
'$575,000',
'$895,000',
'$950,000',
'$675,000',
'$629,000',
'$2,000,000',
'$1,325,000',
'$714,900',
'$699,950',
'$998,000',
'$1,150,000',
'$849,999',
'$999,000',
'$1,050,000',
'$750,000',
'$2,975,000',
'$1,300,000',
'$1,350,000',
'$400,000',
'$1,349,000',
'$1,175,000',
'$1,049,000',
'$3,500,000',
'$849,000',
'$719,000',
'$734,950',
'$1,099,000',
'$769,000',
'$489,000',
'$1,095,000',
'$700,000',
'$475,000',
'$450,000',
'$625,000',
'$330,000',
'$425,000',
'$685,000',
'$385,000',
'$649,950',
'$815,000',
'$699,000',
'$525,000',
'$1,495,000',
'$325,000',
'$835,000',
'$599,950',
'$1,150,000',
'$895,000',
'$998,900',
'$775,000',
'$565,000',
'$750,000',
'$879,000',
'$325,000',
'$1,000,000',
'$785,000',
'$725,000',
'$899,000',
'$1,095,000',
'$1,175,000',
'$815,000',
'$2,300,000',
'$950,000',
'$929,000',
'$1,249,900',
'$1,650,000',
'$1,500,000',
'$639,950',
'$995,000',
'$750,000',
'$630,000',
'$999,000',
'$474,000',
'$390,000',
'$485,000',
'$725,000',
'$500,000',
'$340,000',
'$689,000',
'$525,000',
'$650,000',
'$589,950',
'$665,000',
'$725,000',
'$460,000',
'$749,450',
'$1,088,000',
'$525,000',
'$495,000',
'$830,000',
'$475,000',
'$999,000',
'$849,950',
'$848,000',
'$480,000',
'$538,000',
'$4,585,000',
'$1,150,000',
'$1,045,000',
'$730,000',
'$630,000',
'$1,950,000',
'$899,000',
'$1,975,000',
'$1,179,500',
'$2,100,000',
'$829,000',
'$2,750,000',
'$895,000',
'$849,950',
'$619,500',
'$1,199,000']
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/441710.html
下一篇:檔案打開成功時如何處理關閉函式
