我的函式在呼叫時只回傳串列的第一個元素。我正在使用BeautifulSoup提取資料-有解無憂

這里是python初學者。我正在使用 BeautifulSoup 來抓取 books.toscrape.com 第一頁中所有書籍的詳細資訊（標題、庫存數量）。為此，必須首先獲取所有單本書籍的鏈接。我已經為相同的功能創建了 page1_url。問題是，在回傳提取的鏈接串列時，只回傳串列的第一個元素。請幫助識別錯誤或提供僅使用 BeautifulSoup 的替代代碼。提前致謝！

import requests
from bs4 import BeautifulSoup


def page1_url(page1):
    response= requests.get(page1)
    data= BeautifulSoup(response.text,'html.parser')
   
    
    b1= data.find_all('h3')
    
    for i in b1:
        l=i.find_all('a')
        for j in l:
            l1=j['href']
            books_urls=[]
            books_urls.append(base_url   l1)
            books_urls=list(books_urls)
            return books_urls
            
    
                     

allPages = ['http://books.toscrape.com/catalogue/page-1.html',
            'http://books.toscrape.com/catalogue/page-2.html']

base_url= 'http://books.toscrape.com/catalogue/'
bookURLs= page1_url(allPages[0])
print(bookURLs)

uj5u.com熱心網友回復：

您正在為每個鏈接重寫串列，并且在回圈books_urls中的第一個元素之后回傳函式：for j in l

import requests
from bs4 import BeautifulSoup


def page1_url(page1):
    response= requests.get(page1)
    data= BeautifulSoup(response.text,'html.parser')
   
    b1= data.find_all('h3')
    
    # you were rewriting this list for each link
    books_urls = []

    for i in b1:
        l=i.find_all('a')
        for j in l:
            l1=j['href']
            books_urls.append(base_url   l1)

    # these lines had too many indents
    books_urls=list(books_urls)
    return books_urls
            
    
allPages = ['http://books.toscrape.com/catalogue/page-1.html',
            'http://books.toscrape.com/catalogue/page-2.html']

base_url= 'http://books.toscrape.com/catalogue/'
bookURLs= page1_url(allPages[0])
print(bookURLs)

['http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html', 'http://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html', 'http://books.toscrape.com/catalogue/soumission_998/index.html', 'http://books.toscrape.com/catalogue/sharp-objects_997/index.html', ... 'http://books.toscrape.com/catalogue/its-only-the-himalayas_981/index.html']

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/490638.html

標籤：python-3.x 网页抓取美丽的汤

上一篇：如何用scrapy提取Json

下一篇：有沒有使用python庫而不安裝的方法？[復制]