我只是在學習網頁抓取并想將本網站的結果輸出到 csv 檔案https://www.avbuyer.com/aircraft/private-jets
但我正在努力決議下一頁,這是我的代碼(在 Amen Aziz 的幫助下)它只給了我第一頁
我正在使用 Chrome 所以不確定它是否有任何區別我正在運行 Python 3.8.12
謝謝進步
import requests
from bs4 import BeautifulSoup
import pandas as pd
headers= {'User-Agent': 'Mozilla/5.0'}
response = requests.get('https://www.avbuyer.com/aircraft/private-jets')
soup = BeautifulSoup(response.content, 'html.parser')
postings = soup.find_all('div', class_ = 'listing-item premium')
temp=[]
for post in postings:
link = post.find('a', class_ = 'more-info').get('href')
link_full = 'https://www.avbuyer.com' link
plane = post.find('h2', class_ = 'item-title').text
price = post.find('div', class_ = 'price').text
location = post.find('div', class_ = 'list-item-location').text
desc = post.find('div', class_ = 'list-item-para').text
try:
tag = post.find('div', class_ = 'list-viewing-date').text
except:
tag = 'N/A'
updated = post.find('div', class_ = 'list-update').text
t=post.find_all('div',class_='list-other-dtl')
for i in t:
data=[tup.text for tup in i.find_all('li')]
years=data[0]
s=data[1]
total_time=data[2]
temp.append([plane,price,location,years,s,total_time,desc,tag,updated,link_full])
df=pd.DataFrame(temp,columns=["plane","price","location","Year","S/N","Totaltime","Description","Tag","Last Updated","link"])
next_page = soup.find('a', {'rel':'next'}).get('href')
next_page_full = 'https://www.avbuyer.com' next_page
next_page_full
url = next_page_full
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
df.to_csv('/Users/xxx/avbuyer.csv')
uj5u.com熱心網友回復:
試試這個:如果你愿意,cvs file 那么你完成這一行print(df)并使用df.to_csv("prod.csv")我寫的代碼來獲取 csv 檔案
import requests
from bs4 import BeautifulSoup
import pandas as pd
headers = {'User-Agent': 'Mozilla/5.0'}
temp=[]
for page in range(1, 20):
response = requests.get("https://www.avbuyer.com/aircraft/private-jets/page-{page}".format(page=page),headers=headers,)
soup = BeautifulSoup(response.content, 'html.parser')
postings = soup.find_all('div', class_='grid-x list-content')
for post in postings:
plane = post.find('h2', class_='item-title').text
try:
price = post.find('div', class_='price').text
except:
price=" "
location = post.find('div', class_='list-item-location').text
t=post.find_all('div',class_='list-other-dtl')
for i in t:
data=[tup.text for tup in i.find_all('li')]
years=data[0]
s=data[1]
total_time=data[2]
temp.append([plane,price,location,years,s,total_time])
df=pd.DataFrame(temp,columns=["plane","price","location","Years","S/N","Totaltime"])
print(df)
輸出:
plane price ... S/N Totaltime
0 Gulfstream G280 Make offer ... S/N 2007 Total Time 2528
1 Dassault Falcon 2000LXS Make offer ... S/N 377 Total Time 33
2 Cirrus Vision SF50 G1 Please call ... S/N 0080 Total Time 615
3 Gulfstream IV Make offer ... S/N 1148 Total Time 6425
4 Gulfstream G280 Make offer ... S/N 2072 Total Time 1918
.. ... ... ... ... ...
342 Embraer Phenom 100 Now Sold ... S/N 50000035 Total Time 3417
343 Gulfstream G200 Now Sold ... S/N 152 Total Time 7209
344 Cessna Citation XLS Now Sold ... S/N - Total Time -
345 Cessna Citation Ultra Now Sold ... S/N 560-0393 Total Time 12947
346 Cessna Citation Excel Now Sold ... S/N 560XL-5253 Total Time 4850
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/361672.html
上一篇:獲取a標簽的href
下一篇:抓取數千個網址
