我想在不同的頁面中抓取一些資訊。下面的代碼可以幫助我使用 print() 函式抓取資訊。
問題是我只從最后一頁獲取資料。前幾頁的結果無法寫入 CSV 檔案。我該怎么辦?謝謝。
編碼:
enter code here
import requests
from csv import writer
from bs4 import BeautifulSoup
urls = ['https://www.xxxxxxxxxxxxxxx/02-nb.php','https://www.xxxxxxxxxxxxxxx/03-np.php','https://www.xxxxxxxxxxxxxxx/04-nb.php']
for index,url in enumerate(urls):
requests.get(url)
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
print(soup)
table_data = soup.find('table')
with open("words.csv", "wt",newline='',encoding='utf-8') as csv_file:
csv_data = writer(csv_file, delimiter =',')
for voc in table_data.find_all('tr'):
row_data = voc.find_all('td')
row = [tr.text for tr in row_data]
csv_data.writerow(row)
uj5u.com熱心網友回復:
您正在遍歷每個 URL,但是您撰寫的將資料寫入 CSV 的邏輯在該for回圈之外,因此它只將最后一位資料寫入檔案。我相信你想要的是:
for index,url in enumerate(urls):
requests.get(url)
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
print(soup)
table_data = soup.find('table')
if index:
mode = "a"
else:
mode = "w"
with open("words.csv", mode, newline='',encoding='utf-8') as csv_file:
csv_data = writer(csv_file, delimiter =',')
for voc in table_data.find_all('tr'):
row_data = voc.find_all('td')
row = [tr.text for tr in row_data]
csv_data.writerow(row)
這將words.csv在每次迭代中寫入 the urls,而不是迭代所有urls并寫入words.csv最后一次迭代。
uj5u.com熱心網友回復:
with open("words.csv", "a",newline='',encoding='utf-8') as csv_file:
csv_data = writer(csv_file, delimiter =',')
for voc in table_data.find_all('tr'):
row_data = voc.find_all('td')
row = [tr.text for tr in row_data]
csv_data.writerow(row)
這個代碼塊應該縮進到在每次迭代中執行的右側。還要注意打開模式應該是“a”,它代表“w”模式下的“追加”,你每次都覆寫檔案
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/313628.html
下一篇:一次切片多個CSV檔案
