我必須在網站https://portalbnmp.cnj.jus.br/#/pesquisa-peca中執行網路報廢。
- 我的目標是在“Estado”欄位中選擇“Rio de Janeiro”
- 將密鑰“”發送到“Nome”欄位
- 搜索
- 在出現的表格中,我必須單擊每一行。
- 在下一頁點擊“Emitir”
- 回傳上一頁并再次進入該表的下一行的程序,依此類推。
當我逐行運行時,下面的代碼運行沒有錯誤,但在回圈中我得到了各種錯誤。陳舊,不可點擊,不可執行等。為什么會發生這種情況的一些想法?
for i in range(1, 11):
element = driver.find_element_by_tag_name('p-dropdown')
element.find_element_by_xpath("//*[contains(text(), 'Estado')]").click()
element.find_element_by_xpath("//*[contains(text(), 'Rio de Janeiro')]").click()
search = driver.find_element_by_name("nomePessoa")
search.send_keys("")
search.send_keys(Keys.RETURN)
# row click
table = driver.find_element_by_xpath("//div[@class='ui-datatable-tablewrapper ng-star-inserted']/table/tbody")
rows = table.find_element_by_tag_name('tr')
rows.find_element_by_xpath("//tr[" str(i) "]/td[1]").click()
# click 'Emitir'
buttons = driver.find_element_by_tag_name("button")
buttons.find_element_by_xpath("//*[contains(text(), 'Emitir')]").click()
# return page
driver.back()
uj5u.com熱心網友回復:
如果您從瀏覽器中復制 cookie 并將其粘貼到下面的代碼中,您可以避免使用 Selenium 并大大加快此程序,這將搜索 Rio de Janiero (idEstado = 19) 并回傳 100 個結果(您可以編輯它),然后回圈瀏覽結果并保存所需的 PDF 檔案。
請注意,您正在抓取的網站是不穩定的,并且經常回傳 500 個回應,我在等待幾秒鐘后重試了請求:
import requests
import json
import re
import time
#NB get cookie header from Developer Tools - Network - fetch/xhr - Request Headers once you've passed the captcha test
cookie_value = 'portalbnmp=eyJhbGciOiJIUzUxMiJ9.eyJzdWIiOiJndWVzdF9wb3J0YWxibm1wIiwiYXV0aCI6IlJPTEVfQU5PTllNT1VTIiwiZXhwIjoxNjQzMzY1MjgzfQ.niaw12WlnO3okuY33medP7d3u6j1Y-xGPJ6mShgClfZPrs8br7HQm8XZ5k2k5Wz8J59epbUyE5KAGtSFPpEmrA'
headers = {
'accept':'application/json, text/plain, */*',
'accept-encoding':'gzip, deflate, br',
'accept-language':'en-ZA,en;q=0.9',
'origin':'https://portalbnmp.cnj.jus.br',
'referer':'https://portalbnmp.cnj.jus.br/',
'content-type':'application/json;charset=UTF-8',
'cookie': cookie_value,
'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36'
}
results = 100
url = f'https://portalbnmp.cnj.jus.br/bnmpportal/api/pesquisa-pecas/filter?page=0&size={str(results)}&sort=' #edited to get 100 results, you can edit this size variable
payload = {"buscaOrgaoRecursivo":False,"orgaoExpeditor":{},"idEstado":19} #19 = Rio de Janiero
retries = 1
success = False
while not success:
try:
resp = requests.post(url,headers=headers,data=json.dumps(payload))
print(resp)
if resp.status_code == 200:
success = True
data = resp.json()
except Exception as e:
print(url)
wait = retries
print(f'Error! Waiting {wait} secs and re-trying...')
time.sleep(wait)
retries = 1
print(len(data['content']))
ids = {str(x['id']):x['nomeMae'] '-' x['nomeOrgao'] for x in data['content']} #get all filenames and IDs
for id_,name in ids.items():
url = f'https://portalbnmp.cnj.jus.br/bnmpportal/api/certidaos/relatorio/{id_}/10'
retries = 1
success = False
while not success:
try:
pdf_data = requests.post(url,headers=headers)
if pdf_data.status_code == 200:
success = True
except Exception as e:
wait = retries
print(f'Error! Waiting {wait} secs and re-trying...')
time.sleep(wait)
retries = 1
filename = re.sub(r'[^\w\-_ ]', '_',name) '.pdf' #remove bad characters for filename
print(f'Saving {name}')
with open(filename,'wb') as file:
file.write(pdf_data.content)
uj5u.com熱心網友回復:
使用 Selenium 時,請嘗試添加檢查以確保您正在與之互動的元素已加載。在某些情況下,您可以添加顯式等待。(盡量不要使用 sleep() 之類的方法,因為根據檔案強烈建議不要使用)。
# import webdriver
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# get element after explicitly waiting up to 10 seconds
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.TAG_NAME, "p-dropdown"))
) # I would consider looking up by ID or class
element.find_element_by_xpath("//*[contains(text(), 'Estado')]").click()
... etc
這將使您永遠不會在加載元素之前單擊它。Selenium 要記住的另一件事是元素必須是可見的才能與之互動。您可以滾動到一個元素,通過執行以下操作確保它可見:
# example that scrolls to bottom of page
driver.execute_script("window.scrollTo(0,document.body.scrollHeight);")
# example that scrolls to a specific element
from selenium.webdriver.common.action_chains import ActionChains
actions = ActionChains(driver)
element = driver.find_element_by_tag_name('p-dropdown') # just an example
actions.move_to_element(element)
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/421985.html
標籤:
上一篇:硒/蟒蛇網路
下一篇:錯誤資訊:<selenium.webdriver.firefox.webelement.FirefoxWebElement的系結方法WebElement.click
