是這樣,我要在某個化學品資訊查詢網站上,模擬瀏覽器輸入化學品的CAS號(可以看成是一種ID)爬得其資訊。但是,輸入的CAS號明明是正確的,卻會顯示“您輸入的號碼無法與任何資料匹配”(類似輸錯了號碼);
于是我直接用urllib.request()方法直接嘗試打開(在瀏覽器中正常輸入了CAS號后跳轉的)頁面:
from bs4 import BeautifulSoup
import urllib.request
import ssl
url = "http://gestis-en.itrust.de/nxt/gateway.dll?qeingabe=&f=xhitlist&xhitlist_x=Advanced&xhitlist_s=field%3Asortiername&xhitlist_q=%5BField+schnellsuche%3A*7732z018z05*%5D&xhitlist_d=&xhitlist_hc=&xhitlist_mh=2000&xhitlist_vps=500&xhitlist_xsl=xhitlist.xsl&xhitlist_vpc=first&xhitlist_sel=title%3Bpath%3Brelevance-weight%3Bcontent-type%3Bhome-title%3Bitem-bookmark%3Bfield%3Astoffname%3Bfield%3Asortiername%3Bfield%3Azvgnr%3Bfield%3Acasnr%3Bfield%3Aegnr%3Bfield%3Aindexnr%3Bfield%3Aunnr%3B&searchform_list=%23NoSelection"
headers = {"Host": "gestis-en.itrust.de",
"Referer": "http://gestis-en.itrust.de/nxt/gateway.dll?f=userinfo&userinfo_xsl=banner.xsl&userinfo_cat=saved-search&isclient=",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36"}
answer = urllib.request.Request(url, headers = headers)
gcontext = ssl.SSLContext()
html = urllib.request.urlopen(answer,context=gcontext).read()
soup = BeautifulSoup(html, "html.parser")
print(soup)
然后會得到如下結果:
<html xmlns:js="urn:javascript-functions">
<head>
<meta content="text/html" http-equiv="Content-Type"/>
<title>Search Results</title>
<link href="https://bbs.csdn.net/nxt/gateway.dll?f=stylesheets$fn=main.css$3.0" rel="stylesheet" type="text/css"/><script src="https://bbs.csdn.net/nxt/gateway.dll?f=templates$fn=tri-state-check.js$3.0" type="text/javascript"></script><script src="https://bbs.csdn.net/nxt/gateway.dll?f=templates$fn=escape.js$3.0" type="text/javascript"></script><script src="https://bbs.csdn.net/nxt/gateway.dll?f=templates$fn=domain.js$3.0" type="text/javascript"></script><script src="https://bbs.csdn.net/nxt/gateway.dll?f=templates$fn=common.js$3.0" type="text/javascript"></script><script src="https://bbs.csdn.net/nxt/gateway.dll?f=templates$fn=xhitlist.js$3.0" type="text/javascript"></script><script type="text/javascript">
var xh;
function initPage()
{
var query = nxt.misc.browser.getInputValue('js_params', 'query');
var translatedQuery = nxt.misc.browser.getInputValue('js_params', 'translatedQuery');
var select = nxt.misc.browser.getInputValue('js_params', 'select');
var hitCount = parseInt(nxt.misc.browser.getInputValue('js_params', 'hitCount'));
xh = new nxt.comp.xhitlist.XHitList();
xh.initPage(query, translatedQuery, select, hitCount, true, "112702960");
}
function closeMessage(link)
{
xh.closeMessage(link);
}
</script></head>
<body onload="initPage()">
<form name="js_params"><input name="query" type="hidden" value="https://bbs.csdn.net/topics/[Field schnellsuche:*7732z018z05*]"/><input name="translatedQuery" type="hidden" value="https://bbs.csdn.net/topics/[field,schnellsuche:*7732z018z05*] "/><input name="select" type="hidden" value="https://bbs.csdn.net/topics/title;path;relevance-weight;content-type;home-title;item-bookmark;field:stoffname;field:sortiername;field:zvgnr;field:casnr;field:egnr;field:indexnr;field:unnr;"/><input name="hitCount" type="hidden" value="https://bbs.csdn.net/topics/0"/></form>
<h1>No Documents Found</h1>
<p>Your search did not match any available documents.</p>
</body>
</html>
但這個網頁在瀏覽器中是能正常打開的。各位大神可以試一下。。。
想求問這到底是怎么回事?有什么辦法可以正確得到結果嗎?
跪謝!!!
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/38196.html
