Goodreads抓取時的奇怪行為（Python）-有解無憂

我試圖通過提供一些 ISBN 作為輸入來抓取 Goodreads，更具體地說是 Goodreads 版本。但是，每次代碼運行程序中，我都會遇到錯誤，甚至都不是在同一步驟：

Traceback (most recent call last):
  File "C:xxx.py", line 47, in <module>
    ed_details = get_editions_details(isbn)
  File "C:xxx.py", line 30, in get_editions_details
    ed_item = soup.find("div", class_="otherEditionsLink").find("a")
AttributeError: 'NoneType' object has no attribute 'find'

一切都應該是正確的，div 類是正確的，似乎所有書籍都存在。我檢查了每個瀏覽器，頁面對我來說看起來都一樣。我不知道這是因為現在不推薦使用的庫還是其他原因。

import requests
from bs4 import BeautifulSoup as bs


def get_isbn():
    isbns = ['9780544176560', '9781796898279', '9788845278518', '9780374165277', '9781408839973', '9788838919916', '9780349121994', '9781933372006', '9781501167638', '9781427299062', '9788842050285', '9788807018985', '9780340491263', '9789463008594', '9780739349083', '9780156011594', '9780374106140', '9788845251436', '9781609455910']
    return isbns


def get_page(base_url, data):
    try:
        r = requests.get(base_url, params=data)
    except Exception as e:
        r = None
        print(f"Server responded: {e}")
    return r


def get_editions_details(isbn):
    # Create the search URL with the ISBN of the book
    data = {'q': isbn}
    book_url = get_page("https://www.goodreads.com/search", data)
    # Parse the markup with Beautiful Soup
    soup = bs(book_url.text, 'lxml')

    # Retrieve from the book's page the link for other editions
    # and the total number of editions

    ed_item = soup.find("div", class_="otherEditionsLink").find("a")

    ed_link = f"https://www.goodreads.com{ed_item['href']}"
    ed_num = ed_item.text.strip().split(' ')[-1].strip('()')

    # Return a tuple with all the informations
    return ((ed_link, int(ed_num), isbn))


if __name__ == "__main__":
    # Get the ISBNs from the user
    isbns = get_isbn()

    # Check all the ISBNs
    for isbn in isbns:
        ed_details = get_editions_details(isbn)

uj5u.com熱心網友回復：

您應該始終檢查回傳值。

book_url = get_page("https://www.goodreads.com/search", data)
soup = bs(book_url.text, 'lxml')
ed_item = soup.find("div", class_="otherEditionsLink").find("a")

在這些陳述句中，如果任何回傳值為None，則在嘗試呼叫成員函式時會出錯。例如，如果soup是None，你會做類似的事情None.find(....)，這顯然是錯誤的。

例如，在最后一行中，您可以通過將其分成兩部分來解決此問題：

if ed_item := soup.find("div", class_="otherEditionsLink"):
    if ed_item := ed_item.find("a"):
        ....other code here....

只要soup有效，此代碼就不會嘗試對None值呼叫函式。

轉載請註明出處，本文鏈接：https://www.uj5u.com/qianduan/421238.html

標籤：

上一篇：獲取熊貓中唯一用戶最后一行的特定列值

下一篇：從兩個檔案中獲取資料并將其更改為python中的另一個檔案