選擇器型別的物件不能被JSON序列化 -有解無憂

我正在嘗試刮取一個動態網站，我需要Selenium。

我想要抓取的鏈接只有在我點擊該特定元素時才能打開。它們是由jQuery打開的，所以我唯一的選擇就是點擊它們，因為沒有href屬性或任何可以給我一個URL的東西。

我的做法是這樣的：

# -*- coding: utf-8 -*-
import scrapy

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from scrapy.selector import Selector
from scrapy_selenium import SeleniumRequest

class AnofmSpider（scrapy.Spider）。
    name = 'anofm'/span>
    
    def start_requests（self）。
        yield SeleniumRequest(
            url='https://www.anofm.ro/lmvw.html?agentie=Covasna&categ=3&subcateg=1'。
            callback=self.parse
        )

    def parse（self, response）。 
        driver = response.meta['driver'] 
        try。
            element = WebDriverWait(driver, 10).until(
                EC.presence_of_element_located((By.ID, "tableRepeat2")
            )
        finally:
            html = driver.page_source
            response_obj = Selector(text=html)
            
            links = response_obj.xpath("//tbody[@id='tableRepeat2']"/span>)
            for link in links:
                driver.execute_script("arguments[0].click();"/span>, link)
                
                yield {
                    'Ocupatia'/span>: response_obj.xpath("//div[@id='print']/p/text()[1]")
                }

但它不會作業。

在我想點擊該元素的那一行，我得到了這樣的錯誤：

在我想點擊該元素的那一行，我得到了這樣的錯誤。

TypeError: type Selector is not JSON serializable 的物件。

我有點明白這個錯誤，但我不知道如何解決它。我需要將該物件從一個選擇器轉變為一個可點擊的按鈕，

。

我在網上查詢了解決方案和檔案，但我沒有找到任何有用的東西。

誰能幫助我更好地理解這個錯誤，以及我應該如何解決它？

謝謝。

uj5u.com熱心網友回復：

實際上，資料也是從API呼叫JSON回應中生成的，你可以很容易地從API中搜刮。這里是作業方案和分頁的情況。每頁包含8個專案，總專案為32個。

代碼：

import scrapy
import json

class AnofmSpider（scrapy.Spider）。

    name = 'anofm'/span>

    def start_requests（self）。
        yield scrapy.Request(
            url='https://www.anofm.ro/dmxConnect/api/oferte_bos/oferte_bos_query2L_Test.php?offset=8&cauta=&select=Covasna& limit=8&localitate='。
            method='GET',
            callback=self.parse,
            meta= {
                'limit': 8}。
                )


    def parse（self, response）。
        resp = json.load(response.body)
        hits = resp.get('lmv').get('data')
        for h in hits:
            yield {
                'Ocupatia': h.get('OCCUPATION')
            }


        total_limit = resp.get('lmv'/span>).get('total')
        next_limit = response.meta['limit']  8
        if next_limit <= total_limit:
            yield scrapy.Request(
                url=f'https://www.anofm.ro/dmxConnect/api/oferte_bos/oferte_bos_query2L_Test.php?offset=8&cauta=&select=Covasna& limit={next_limit}& localitate='。
                method='GET',
                callback=self.parse,
                meta= {
                    'limit': next_limit}.
                    )

uj5u.com熱心網友回復：

你把Scrapy物件和Selenium函式混在一起，這就產生了問題。我不知道如何轉換物件，但我將簡單地只使用Selenium來處理這個問題

。

 finally:

            links = driver.find_elements_by_xpath("//tbody[@id='tableRepeat2']/tr")
            print('len(link):', len(link))
            
            for link in links:
                #對我來說不起作用 - 甚至。
                #driver.execute_script("arguments[0].scrollIntoView();", link) 
                #link.click()。
                
                # open information
                driver.execute_script("arguments[0].click();", link)
                
                # javascript可能需要一些時間來顯示它。
                time.sleep(1)
                
                # 獲取資料 
                ocupatia = driver.find_element_by_xpath(".//div[@id='print']/p").text
                ocupatia = ocupatia.split('
', 1) [0] # first line.
                ocupatia = ocupatia.split(':', 1)[1].strip() # text after first `:`
                print('Ocupatia --> ' , ocupatia)

                #關閉資訊。
                driver.find_element_by_xpath('//button[text()="Inchide"]'/span>).click()

                yield {
                    'Ocupatia': ocupatia
                }

完整的作業代碼。

每個人都可以把它放在一個檔案中并運行python script.py，而不需要在scrapy中創建專案。

你必須把SELENIUM_DRIVER_EXECUTABLE_PATH改為正確的路徑。

import scrapy

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from scrapy.selector import Selector
from scrapy_selenium import SeleniumRequest
import time

class AnofmSpider（scrapy.Spider）。
    name = 'anofm'/span>
    
    def start_requests（self）。
        yield SeleniumRequest(
            url='https://www.anofm.ro/lmvw.html?agentie=Covasna&categ=3&subcateg=1'/span>。
            #callback=self.parse[/span
        )

    def parse（self, response）。 
        driver = response.meta['driver'] 
        try:
            print("try")
            element = WebDriverWait(driver, 20).until(
                EC.existence_of_element_located((By.XPATH, "//tbody[@id='tableRepeat2']/tr/td")
            )
        finally:
            print("finally")

            links = driver.find_elements_by_xpath("//tbody[@id='tableRepeat2']/tr")
            print('len(link):', len(link))
            
            for link in links:
                #driver.execute_script("arguments[0].scrollIntoView();", link) 
                #link.click()。
                
                # open information
                driver.execute_script("arguments[0].click();", link)
                
                # javascript可能需要一些時間來顯示它。
                time.sleep(1)
                
                # 獲取資料 
                ocupatia = driver.find_element_by_xpath(".//div[@id='print']/p").text
                ocupatia = ocupatia.split('
', 1) [0] # first line.
                ocupatia = ocupatia.split(':', 1)[1].strip() # text after first `:`
                print('Ocupatia --> ' , ocupatia)

                #關閉資訊。
                driver.find_element_by_xpath('//button[text()="Inchide"]'/span>).click()

                yield {
                    'Ocupatia': ocupatia
                }

# ---在沒有專案的情況下運行并保存在`output.csv`中 ---

from scrapy.crawler import CrawlerProcess

c = CrawlerProcess({
    'USER_AGENT'。'Mozilla/5.0',
    # save in file CSV, JSON or XML.
    'FEEDS'/span>: {'output.csv'/span>: {'format'/span>: 'csv'}}, # new in 2.1.

    'DOWNLOADER_MIDDLEWARES': {'scrapy_selenium.SeleniumMiddleware': 800}。

    'SELENIUM_DRIVER_NAME': 'firefox',
    'SELENIUM_DRIVER_EXECUTABLE_PATH': '/home/furas/bin/geckodriver',
    'SELENIUM_DRIVER_ARGUMENTS': [], # ['-headless'].
})
c.crawl(AnofmSpider)
c.start()

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/327103.html

標籤：

上一篇：想實作上傳檔案的自動化程序--試過機器人類，但在browserstack中失敗了。希望能找到解決方法

下一篇：如何在tkinterpython模塊中傳遞sqlite3表中每項內容的第二項？