我試圖通過提供一些 ISBN 作為輸入來抓取 Goodreads,更具體地說是 Goodreads 版本。但是,每次代碼運行程序中,我都會遇到錯誤,甚至都不是在同一步驟:
Traceback (most recent call last):
File "C:xxx.py", line 47, in <module>
ed_details = get_editions_details(isbn)
File "C:xxx.py", line 30, in get_editions_details
ed_item = soup.find("div", class_="otherEditionsLink").find("a")
AttributeError: 'NoneType' object has no attribute 'find'
一切都應該是正確的,div 類是正確的,似乎所有書籍都存在。我檢查了每個瀏覽器,頁面對我來說看起來都一樣。我不知道這是因為現在不推薦使用的庫還是其他原因。
import requests
from bs4 import BeautifulSoup as bs
def get_isbn():
isbns = ['9780544176560', '9781796898279', '9788845278518', '9780374165277', '9781408839973', '9788838919916', '9780349121994', '9781933372006', '9781501167638', '9781427299062', '9788842050285', '9788807018985', '9780340491263', '9789463008594', '9780739349083', '9780156011594', '9780374106140', '9788845251436', '9781609455910']
return isbns
def get_page(base_url, data):
try:
r = requests.get(base_url, params=data)
except Exception as e:
r = None
print(f"Server responded: {e}")
return r
def get_editions_details(isbn):
# Create the search URL with the ISBN of the book
data = {'q': isbn}
book_url = get_page("https://www.goodreads.com/search", data)
# Parse the markup with Beautiful Soup
soup = bs(book_url.text, 'lxml')
# Retrieve from the book's page the link for other editions
# and the total number of editions
ed_item = soup.find("div", class_="otherEditionsLink").find("a")
ed_link = f"https://www.goodreads.com{ed_item['href']}"
ed_num = ed_item.text.strip().split(' ')[-1].strip('()')
# Return a tuple with all the informations
return ((ed_link, int(ed_num), isbn))
if __name__ == "__main__":
# Get the ISBNs from the user
isbns = get_isbn()
# Check all the ISBNs
for isbn in isbns:
ed_details = get_editions_details(isbn)
uj5u.com熱心網友回復:
您應該始終檢查回傳值。
book_url = get_page("https://www.goodreads.com/search", data)
soup = bs(book_url.text, 'lxml')
ed_item = soup.find("div", class_="otherEditionsLink").find("a")
在這些陳述句中,如果任何回傳值為None,則在嘗試呼叫成員函式時會出錯。例如,如果soup是None,你會做類似的事情None.find(....),這顯然是錯誤的。
例如,在最后一行中,您可以通過將其分成兩部分來解決此問題:
if ed_item := soup.find("div", class_="otherEditionsLink"):
if ed_item := ed_item.find("a"):
....other code here....
只要soup有效,此代碼就不會嘗試對None值呼叫函式。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/421238.html
標籤:
