我正在嘗試執行決議,但是當我發送 POST 方法來獲取搜索結果時,獲取頁面時出錯:請求的 URL 被拒絕。請咨詢您的管理員。
網站:https : //prod.ceidg.gov.pl/CEIDG/CEIDG.Public.UI/Search.aspx
我已經收集了 viewstate、viewstategenerator 等資料來傳遞表單但不起作用。我錯過了什么?
#import requests
from bs4 import BeautifulSoup
import lxml
import urllib
from requests_html import HTMLSession
from requests_html import AsyncHTMLSession
import time
#s = HTMLSession(browser_args=["--no-sandbox", '--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'])
s= HTMLSession()
header_simple = {
'User_Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36',
'HTTP_ACCEPT': 'text/html,application/xhtml xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Content-Type': 'application/x-www-form-urlencoded',
}
r = s.request('get', 'http://prod.ceidg.gov.pl/CEIDG/CEIDG.Public.UI/Search.aspx')
soup_dummy = BeautifulSoup(r.content, "lxml")
# parse and retrieve two vital form values
viewstate = soup_dummy.select("#__VIEWSTATE")[0]['value']
viewstategen = soup_dummy.select("#__VIEWSTATEGENERATOR")[0]['value']
eventvalidation = soup_dummy.select("#__EVENTVALIDATION")[0]['value']
english = soup_dummy.select("#hfEnglishWebsiteUrl")[0]['value']
data = {
'__VIEWSTATE': viewstate,
'__VIEWSTATEGENERATOR': viewstategen,
'__EVENTVALIDATION': eventvalidation,
'ctl00$MainContent$txtName': 'bank',
'ctl00$MainContent$cbIncludeCeased': 'on',
'ctl00$MainContent$btnSearch': 'Find',
'ctl00$hfAuthRequired': 'False',
'ctl00$hfEnglishWebsiteUrl': english,
'ctl00$stWarningLength': '30',
'ctl00$stIdleAfter': '1200',
'ctl00$stPollingInterval': '60',
'ctl00$stMultiTabTimeoutSyncInterval': '20'
}
time.sleep(3)
p = s.request('post', 'https://prod.ceidg.gov.pl/CEIDG/CEIDG.Public.UI/Search.aspx', params=data, headers=header_simple)
print(p.content)
uj5u.com熱心網友回復:
這是使用請求模塊從該頁面填充結果的方法之一。在發送 post 請求以訪問所需內容時,請確保在資料引數中包含所有鍵和值。
作業腳本:
import lxml
import requests
from pprint import pprint
from bs4 import BeautifulSoup
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
r = s.get('http://prod.ceidg.gov.pl/CEIDG/CEIDG.Public.UI/Search.aspx')
soup = BeautifulSoup(r.text,"lxml")
data = {i['name']:i.get('value','') for i in soup.select('input[name]')}
data['ctl00$MainContent$txtName'] = 'bank'
data['ctl00$MainContent$cbIncludeCeased'] = 'on'
data['ctl00$MainContent$btnSearch'] = 'Find'
data.pop('ctl00$MainContent$btnClear')
data.pop('ctl00$versionDetails$btnClose')
# pprint(data) #print it to see the keys and values that have been included within data
p = s.post('https://prod.ceidg.gov.pl/CEIDG/CEIDG.Public.UI/Search.aspx', data=data)
soup = BeautifulSoup(p.text,"lxml")
print(soup.select_one("table#MainContent_DataListEntities"))
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/325413.html
