我正在嘗試從該網站https://compranet.hacienda.gob.mx/esop/guest/go/public/opportunity/current?locale=es_MX抓取過濾后的結果。
首先我應用了過濾器“Código, descripción o referencia del Expediente”,在這之后出現了一個新容器,我選擇了“Contiene”選項,最后我搜索了一個特定的詞(在這種情況下是“anestesia”),但是我沒有'不知道如何抓取結果表以從所有過濾結果中獲取“描述”部分中出現的鏈接。我是使用 selenium 的新手,我想獲取過濾后的鏈接,或者知道是否有其他選項來獲取我需要的資訊。
這是我的代碼:
import random
from time import sleep
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import requests
from lxml import html
s=Service('./chromedriver.exe')
driver = webdriver.Chrome(service=s)
driver.get('https://compranet.hacienda.gob.mx/esop/guest/go/public/opportunity/current?
locale=es_MX')
sleep(5)
driver.find_element(By.XPATH ,"//*[@id='widget_filterPickerSelect']/div[1]/input").click()
sleep(5)
driver.find_element(By.XPATH,"//*[@id='filterPickerSelect_popup1']").click()
sleep(5)
driver.find_element(By.XPATH,"//*[@id='projectInfo_FILTER_OPERATOR_ID']/option[2]").click()
sleep(5)
busqueda = driver.find_element(By.XPATH,"//*[@id='projectInfo_FILTER']")
busqueda.send_keys("anestesia")
busqueda.send_keys(Keys.ENTER)
特別是我想刮
<a href="#fh" class="detailLink" onclick="javascript:goToDetail('2110224', '01000');stopEventPropagation(event);" title="Ver detalle: PC-050GYR017-E140-2022 SERVICIO INTEGRAL DE ANESTESIA, PARA EL EJERCICIO DEL 1o">PC-050GYR017-E140-2022 SERVICIO INTEGRAL DE ANESTESIA, PARA EL EJERCICIO DEL 1o</a>
我需要獲取鏈接。
uj5u.com熱心網友回復:
您需要使用顯式等待。
為了獲得最終頁面上的鏈接,您應該使用find_elements或者visibility_of_all_elements_located因為存在多個 Web 元素。如果您只想抓取鏈接,我會說只使用這一行print(link.get_attribute('href')),其余兩個您可以發表評論。
代碼:
s=Service('./chromedriver.exe')
driver = webdriver.Chrome(service=s)
driver.maximize_window()
wait = WebDriverWait(driver, 20)
driver.get('https://compranet.hacienda.gob.mx/esop/guest/go/public/opportunity/current?locale=es_MX')
wait.until(EC.element_to_be_clickable((By.XPATH, "//input[@value='▼ ']"))).click()
wait.until(EC.element_to_be_clickable((By.XPATH, "//div[@id='filterPickerSelect_popup1']"))).click()
select = Select(wait.until(EC.presence_of_element_located((By.ID, "projectInfo_FILTER_OPERATOR_ID"))))
select.select_by_value('CONTAINS')
busqueda = wait.until(EC.visibility_of_element_located((By.ID, "projectInfo_FILTER")))
busqueda.send_keys("anestesia")
time.sleep(2)
busqueda.send_keys(Keys.ENTER)
links = wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//a[@class='detailLink'][@href]")))
for link in links:
print(link.get_attribute('innerText'))
print(link.get_attribute('href'))
print(link.get_attribute('title'))
進口:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
輸出:
PC-050GYR017-E140-2022 SERVICIO INTEGRAL DE ANESTESIA, PARA EL EJERCICIO DEL 1o
https://compranet.hacienda.gob.mx/esop/toolkit/opportunity/current/list.si?reset=true&resetstored=true&userAct=changeLangIndex&language=es_MX&_ncp=1649225706261.4394-1#fh
Ver detalle: PC-050GYR017-E140-2022 SERVICIO INTEGRAL DE ANESTESIA, PARA EL EJERCICIO DEL 1o
SERVICIO DE MANTENIMIENTO PREVENTIVO Y CORRECTIVO DE EQUIPO MéDICO
https://compranet.hacienda.gob.mx/esop/toolkit/opportunity/current/list.si?reset=true&resetstored=true&userAct=changeLangIndex&language=es_MX&_ncp=1649225706261.4394-1#fh
Ver detalle: SERVICIO DE MANTENIMIENTO PREVENTIVO Y CORRECTIVO DE EQUIPO MéDICO
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/457784.html
