剛開始學習爬蟲,再網上看到一個爬取貓眼top100的實體,跟著做,但是爬取的結果是' [] ',看了回傳的網頁,不是top100的源代碼,有提到驗證
,請問要怎么解決呢源代碼如下:
import requests
from requests.exceptions import RequestException
import re
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36'}
def get_one_page(url):
try:
response = requests.get(url, headers=headers)
if response.status_code == 200:
return response.text
return None
except RequestException:
return None
def parse_one_page(html):
pattern = re.compile('<dd>.*?board-index.*?>(\d+)</i>.*?data-src="https://bbs.csdn.net/topics/(.*?)".*?name"><a'
+'.*?>(.*?)</a>.*?star">(.*?)</p>.*?releasetime">(.*?)</p>'
+'.*?integer">(.*?)</i>.*?fraction">(.*?)</i>.*?</dd>', re.S)
items = re.findall(pattern, html)
print(items)
def main():
url = "https://maoyan.com/board/4?"
html = get_one_page(url)
parse_one_page(html)
if __name__ == '__main__':
main()
uj5u.com熱心網友回復:
貓眼設定了反爬需要美團驗證,驗證了才會顯示資料。(這個驗證有時候會在頁面中彈出來,有時候就不會)我也遇到你這種情況(頭疼)轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/194548.html
上一篇:Fortran語言求助
