我正在嘗試從該站點
點擊那個按鈕

并刮掉這些數字

我確實知道如何抓取數字/單擊或選擇按鈕,但我不知道如何從那個奇怪的下拉選單中迭代地選擇每個選項......
我確實嘗試單擊該按鈕以打開下拉選單,作為互聯網上的一些建議,但無法這樣做..:

button1 = driver.find_element_by_xpath('/html/body/form/div[3]/div[1]/div/div/div[1]/select')
但我得到錯誤:訊息:沒有這樣的元素:無法找到元素
希望您對網路報廢領域的新手提供幫助:)
uj5u.com熱心網友回復:
你需要的資料是用js加載的,所以你可以使用Selenium來獲取城市串列。這是一種可能的解決方案:
import csv
import requests
from typing import Union, Any
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def get_data(url: str, city_name: str) -> Union[dict[str, Any], str]:
payload = {
'city': city_name,
'mainCategory': '??? ????',
'secondCategory': '??? ?? ????'
}
headers = {
'User-Agent': 'Mozilla/5.0'
}
try:
r = requests.post(url, data=payload, headers=headers).json()
return {
"City Name": city_name,
"Ventures": r[0],
"Realizable Investments": r[1],
"Realized Investments": r[2],
"Amount Invested Since 1989": r[3]
}
except ValueError:
return f'No data for {city_name}'
def save_to_csv(data: list) -> None:
with open(file='pais.csv', mode='a', encoding="utf-8") as f:
writer = csv.writer(f, lineterminator='\n')
writer.writerow([*data])
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_experimental_option("excludeSwitches", ["enable-automation", "enable-logging"])
service = Service(executable_path="path/to/your/chromedriver.exe")
driver = webdriver.Chrome(service=service, options=options)
wait = WebDriverWait(driver, 15)
main_url = 'https://www.pais.co.il/info/Thank-to.aspx'
post_call_url = 'https://www.pais.co.il/grants/grantsRequestNumbers.ashx'
driver.get(main_url)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.TAG_NAME, "iframe")))
cities = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '#FacilitiesStats_ddlcity>option')))
city_names = [city.text for city in cities[1:]]
for name in city_names:
data = get_data(post_call_url, name)
if isinstance(data, dict):
save_to_csv(data.values())
else:
print(data)
driver.quit
對于某些城市,例如沒有資料:"?????? ??-???"所以我們只是列印到控制臺No data for ?????? ??-???
輸出 csv 檔案pais.csv:
??? ???,19,6117232,14813422,20930654
??? ????,29,6517560,16225629,22743189
??? ?????,28,3945008,13107701,17052709
??? ??-???,76,56738614,200980004,257718618
??????,109,21988456,130339851,152328307
經測驗Python 3.9.10。使用Selenium 4.5.0和requests 2.28.1
當然,我們可以只使用 Selenium 而不使用requests庫來獲取所需的資料。但是在測驗了這個解決方案之后,在我看來它更快。因為在發出 post 請求時,我們會立即獲得所需的值,而要使用 Selenium 從 tag( div.counter) 接收資料,我們必須等待計數器影片完成
例如,您也可以使用,ThreadPoolExecutor然后獲取和保存資料的程序會更快。這是一種可能的解決方案:
import csv
import requests
from itertools import repeat
from typing import Union, Any
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from concurrent.futures import ThreadPoolExecutor
def get_data(url: str, city_name: str) -> Union[dict[str, Any], str]:
payload = {
'city': city_name,
'mainCategory': '??? ????',
'secondCategory': '??? ?? ????'
}
headers = {
'User-Agent': 'Mozilla/5.0'
}
try:
r = requests.post(url, data=payload, headers=headers).json()
return {
"City Name": city_name,
"Ventures": r[0],
"Realizable Investments": r[1],
"Realized Investments": r[2],
"Amount Invested Since 1989": r[3]
}
except ValueError:
return f'No data for {city_name}'
def save_to_csv(data: Union[dict, str]) -> None:
if isinstance(data, dict):
with open(file='pais.csv', mode='a', encoding="utf-8") as f:
writer = csv.writer(f, lineterminator='\n')
writer.writerow([*data.values()])
else:
print(data)
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_experimental_option("excludeSwitches", ["enable-automation", "enable-logging"])
service = Service(executable_path="path/to/your/chromedriver.exe")
driver = webdriver.Chrome(service=service, options=options)
wait = WebDriverWait(driver, 15)
main_url = 'https://www.pais.co.il/info/Thank-to.aspx'
post_call_url = 'https://www.pais.co.il/grants/grantsRequestNumbers.ashx'
driver.get(main_url)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.TAG_NAME, "iframe")))
cities = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '#FacilitiesStats_ddlcity>option')))
city_names = [city.text for city in cities[1:]]
with ThreadPoolExecutor() as executor:
data = executor.map(get_data, repeat(post_call_url), city_names)
executor.map(save_to_csv, data)
driver.quit
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/530079.html
上一篇:cloudcraper.exceptions.CloudflareChallengeError:檢測到Cloudflare版本2驗證碼挑戰
下一篇:如何跳過bs4標簽內的一些迭代?
