您好,這是我在 python 中的第一個專案,我的目標是在 goodreads 中抓取書籍的完整描述。該腳本的最終目標是輸入您想要的圖書 id,并在檔案中取回一列中的 book_id 和該 book_id 的描述。現在我可以在串列中輸入我想要的專案的編號并獲取描述。
my_urls = 'https://www.goodreads.com/book/show/' book_id[0]如何回圈此程序并獲取每本書的描述?這是我的代碼,在此先感謝。
import bs4 as bs
import urllib.request
import csv
import requests
import re
from urllib.request import urlopen
from urllib.error import HTTPError
book_id = ['17227298','18386','1852','17245','60533063'] # Here I enter my book idυ
my_urls = 'https://www.goodreads.com/book/show/' book_id[0] #I concatenate book_id with the url
source = urlopen(my_urls).read()
soup = bs.BeautifulSoup(source, 'lxml')
short_description = soup.find('div', class_='readable stacked').span # finds the description div
full_description = short_description.find_next_siblings('span') # Goes to the sibling span that has the full description
def get_description(soup):
full_description = short_description.find_next_siblings('span')
return full_description
uj5u.com熱心網友回復:
定義對一個專案執行操作的方法
def get_description(book_id):
my_urls = 'https://www.goodreads.com/book/show/' book_id
source = urlopen(my_urls).read()
soup = bs.BeautifulSoup(source, 'lxml')
short_description = soup.find('div', class_='readable stacked').span
full_description = short_description.find_next_siblings('span')
return full_description
然后在串列的每個專案上呼叫它
book_ids = ['17227298', '18386', '1852', '17245', '60533063']
for book_id in book_ids:
print(get_description(book_id))
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/457777.html
