bs4findAll沒有收集網站上其他頁面的所有資料 -有解無憂

我正在嘗試使用BeautifulSoup刮取一個房地產網站。我試圖獲得一份倫敦的租金價格清單。這很有效，但只適用于網站的第一頁。有超過150個，所以我錯過了很多資料。我希望能夠從所有的頁面中收集所有的價格。以下是我正在使用的代碼：

import requests from bs4 import BeautifulSoup as soup url = 'https://www.zoopla.co.uk/to-rent/property/central-london/?beds_max=5&price_frequency=per_month&q=Central London& results_sort=newest_listings&search_source=home' 回應 = requests.get(url) response.status_code data = soup(response.content, 'lxml') 價格 = [] for line in data.findAll('div'/span>, {'class'/span>: 'css-1e28vvi-PriceContainer e2uk8e7'}）。) price = str(line).split('>')[2].split(' ')[0] 。 replace('￡', ').replace(', ',') price = int(price) prices.append(price)

有什么想法嗎，為什么我不能用這個腳本收集所有頁面的價格？

額外的問題：是否有一種方法可以使用湯來訪問價格，IE中的任何串列/字串操作？當我呼叫data. find('div', {'class': 'css-1e28vvi-PriceContainer e2uk8e7'})我得到一個如下形式的字串 <div class="css-1e28vvi-PriceContainer e2uk8e7" data-testid="listing-price" > <p class="css-1o565rw-Text eczcs4p0" size="6">￡3,012 pcm</p></div>

如果有任何幫助，我們將不勝感激！

。

uj5u.com熱心網友回復：

你可以在URL中附加&pn=<page number>引數來獲得下一個頁面：

import re
import requests
from bs4 import BeautifulSoup as soup

url = "https://www.zoopla.co.uk/to-rent/property/central-london/?beds_max=5&price_frequency=per_month&q=Central London&results_sort=newest_listings&；search_source=home&；pn="

價格 = []
for page in range（1, 3）。 # <-- 在這里增加頁面數量。
    data = soup(requests.get(url   str(page)).content, "lxml"/span>)

    for line in data.findAll(
        "div"/span>, {"class"/span>: "css-1e28vvi-PriceContainer e2uk8e7"}.
    ):
        price = line.get_text(strip=True)
        price = int(re.sub(r"[^d]", ", price)
        prices.append(price)
        print(price)
    print("-"/span> * 80)

print(len(price))

列印：

。
...

1993
1993
--------------------------------------------------------------------------------
50

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/320235.html

標籤：

上一篇：ExcelVBA提交微軟表格

下一篇：在使用PythonScrapy庫時，Response.css在獲取div資料時顯示空串列？