此代碼在我運行時不會崩潰。輸出檔案flymag.cs??v 已填充,但不是我想要的。我想添加div > h2和 div > h3以便飛機制造商和飛機模型都包含在輸出中。我真的希望記錄采用傳統的 excel行格式,并抓取所有飛機制造商和型號
import requests, csv
from bs4 import BeautifulSoup
from urllib.request import Request
url = 'https://www.flyingmag.com/2019-buyers-single-engine-piston/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36'}
with open('flyingmag.csv', "w", encoding="utf-8-sig") as f:
writer = csv.writer(f)
writer.writerow(['Base_Price','Typically_Equipped_Price','Engine','Horsepower','Propeller','Seats','Length','Height','Wingspan','Wing_Area','Wing_Loading','Power_Loading','Max_Takeoff_Weight','Empty_Weight','Useful_Load','Fuel_Capacity','Max_Operating_Altitude','Max_Rate_of_Climb','Max_Cruise_Speed','Normal_Cruise_Speed','Never_Exceed_Speed','Stall_Speed-Flaps_Up','Stall_Speed-Landing_Configuration','Max_Range','Takeoff_Roll','Takeoff_Distance_Over_50_ft.','Landing_Roll','Landing_Distance_Over_50_ft'])
while True:
html = requests.get(url , headers = headers)
soup = BeautifulSoup(html.text, 'html.parser')
for row in soup.select('table tbody tr'):
writer.writerow([c.text if c.text else '' for c in row.select('td')])
print(row)
else:
break
uj5u.com熱心網友回復:
您可以首先通過定位 h3 標頭來計算總體“部分”的數量,或者我稱之為串列的數量,section:has([data-widget_type="heading.default"])然后回圈這些標頭并提取制造商。用于find_next移動到包含模型和表格的實際以下部分。如果向下滾動到底部,所有資料似乎都顯示在該單頁上。
關于標題:
td:not([colspan]) strong
:not([colspan])用于排除Back to Top每個串列的每個串列的最后一行。這是一個具有屬性的“合并單元格”,colspan不包含您想要的資料。您也可以使用nth-child range選擇器。第一個(或查看頁面時最左邊的)和第三個表格列用于標題,我只為第一個串列訪問這些列。我檢查了這些相同的標題最初出現在所有表中。然后該空間 strong用于選擇后代strong元素,這些元素存在于表的每一行中的第一個和第三個td孩子。
關于標題后csv中的行值:
td:not([colspan]):nth-child(even)
第一部分是按照標題的解釋。但是,我沒有strong使用型別選擇器添加后代組合器,而是簡單地使用了nth-child(even); 這根據需要選擇了第 2 列和第 4 列,因為它們是偶數編號的子項。
import requests, csv
r = requests.get('https://www.flyingmag.com/2019-buyers-single-engine-piston')
soup = bs(r.content, 'lxml')
listings = soup.select('section:has([data-widget_type="heading.default"])')
with open('flyingmag.csv', "w", encoding="utf-8-sig", newline='') as f:
writer = csv.writer(f, delimiter = ",", quoting=csv.QUOTE_MINIMAL)
for num, listing in enumerate(listings):
manufacturer = listing.select_one('[data-widget_type="heading.default"] h2').text
model = listing.find_next('h3').text
table = listing.find_next('table')
if num == 0:
row = ['Manufacturer', 'Model']
row.extend([i.text for i in table.select('td:not([colspan]) strong')])
writer.writerow(row)
values = [i.text for i in table.select('td:not([colspan]):nth-child(even)')]
row = [manufacturer, model]
row.extend(values)
writer.writerow(row)
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/413673.html
標籤:
