WebScraping&BeautifulSoup-下一頁決議-有解無憂

我只是在學習網頁抓取并想將本網站的結果輸出到 csv 檔案https://www.avbuyer.com/aircraft/private-jets

但我正在努力決議下一頁，這是我的代碼（在 Amen Aziz 的幫助下）它只給了我第一頁
我正在使用 Chrome 所以不確定它是否有任何區別我正在運行 Python 3.8.12
謝謝進步

import requests
from bs4 import BeautifulSoup
import pandas as pd
headers= {'User-Agent': 'Mozilla/5.0'}
response = requests.get('https://www.avbuyer.com/aircraft/private-jets')
soup = BeautifulSoup(response.content, 'html.parser')
postings = soup.find_all('div', class_ = 'listing-item premium')
temp=[]
for post in postings:
    link = post.find('a', class_ = 'more-info').get('href')
    link_full = 'https://www.avbuyer.com'  link
    plane = post.find('h2', class_ = 'item-title').text
    price = post.find('div', class_ = 'price').text
    location = post.find('div', class_ = 'list-item-location').text
    desc = post.find('div', class_ = 'list-item-para').text
    try:
        tag = post.find('div', class_ = 'list-viewing-date').text
    except:
        tag = 'N/A'
    updated = post.find('div', class_ = 'list-update').text
    t=post.find_all('div',class_='list-other-dtl')
    for i in t:
        data=[tup.text for tup in i.find_all('li')]
        years=data[0]
        s=data[1]
        total_time=data[2]

        temp.append([plane,price,location,years,s,total_time,desc,tag,updated,link_full])

df=pd.DataFrame(temp,columns=["plane","price","location","Year","S/N","Totaltime","Description","Tag","Last Updated","link"])


next_page = soup.find('a', {'rel':'next'}).get('href')
next_page_full = 'https://www.avbuyer.com' next_page
next_page_full

url = next_page_full
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml') 

df.to_csv('/Users/xxx/avbuyer.csv')

uj5u.com熱心網友回復：

試試這個：如果你愿意，cvs file 那么你完成這一行print(df)并使用df.to_csv("prod.csv")我寫的代碼來獲取 csv 檔案

import requests
from bs4 import BeautifulSoup
import pandas as pd
headers = {'User-Agent': 'Mozilla/5.0'}
temp=[]
for page in range(1, 20):
    response = requests.get("https://www.avbuyer.com/aircraft/private-jets/page-{page}".format(page=page),headers=headers,)
    soup = BeautifulSoup(response.content, 'html.parser')
    postings = soup.find_all('div', class_='grid-x list-content')
    for post in postings:
        plane = post.find('h2', class_='item-title').text
        try:
            price = post.find('div', class_='price').text
        except:
            price=" "
        location = post.find('div', class_='list-item-location').text
        t=post.find_all('div',class_='list-other-dtl')
        for i in t:
            data=[tup.text for tup in i.find_all('li')]
            years=data[0]
            s=data[1]
            total_time=data[2]
            temp.append([plane,price,location,years,s,total_time])

df=pd.DataFrame(temp,columns=["plane","price","location","Years","S/N","Totaltime"])
print(df)

輸出：

                      plane         price  ...             S/N         Totaltime
0            Gulfstream G280     Make offer  ...        S/N 2007   Total Time 2528
1    Dassault Falcon 2000LXS     Make offer  ...         S/N 377     Total Time 33
2       Cirrus Vision SF50 G1  Please call   ...        S/N 0080    Total Time 615
3              Gulfstream IV     Make offer  ...        S/N 1148   Total Time 6425
4            Gulfstream G280     Make offer  ...        S/N 2072   Total Time 1918
..                        ...           ...  ...             ...               ...
342       Embraer Phenom 100       Now Sold  ...    S/N 50000035   Total Time 3417
343          Gulfstream G200       Now Sold  ...         S/N 152   Total Time 7209
344     Cessna Citation XLS        Now Sold  ...           S/N -      Total Time -
345    Cessna Citation Ultra       Now Sold  ...    S/N 560-0393  Total Time 12947
346    Cessna Citation Excel       Now Sold  ...  S/N 560XL-5253   Total Time 4850

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/361672.html

標籤：Python 网页抓取美汤

上一篇：獲取a標簽的href

下一篇：抓取數千個網址