這段代碼不會崩潰,這很好。但是,它會生成并清空icao_publications.csv f。我想用 URL 中所有頁面上的所有記錄填充icao_publications.csv并捕獲所有頁面。資料集應該是大約 10,000 行或總共大約 10,000 行。我想在 csv 檔案中獲取這 10,000 左右的行。
import requests, csv
from bs4 import BeautifulSoup
url = 'https://www.icao.int/publications/DOC8643/Pages/Search.aspx'
with open('Test1_Aircraft_Type_Designators.csv', "w", encoding="utf-8") as f:
writer = csv.writer(f)
writer.writerow(["Manufacturers", "Model", "Type_Designator", "Description", "Engine_Type", "Engine_Count", "WTC"])
while True:
html = requests.get(url)
soup = BeautifulSoup(html.text, 'html.parser')
for row in soup.select('table tbody tr'):
writer.writerow([c.text if c.text else '' for c in row.select('td')])
if soup.select_one('li.paginate_button.active li a'):
url = soup.select_one('li.paginate_button.active li a')['href']
else:
break
uj5u.com熱心網友回復:
干得好:
import requests
import pandas as pd
url = 'https://www4.icao.int/doc8643/External/AircraftTypes'
resp = requests.post(url).json()
df = pd.DataFrame(resp)
df.to_csv('aircraft.csv',index=False)
print('Saved to aircraft.csv')
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/409864.html
標籤:
