我是 Python 的新手。以下問題:
我有一個要從中抓取資料的網址串列。我不知道我的代碼有什么問題,我無法從所有 url 中檢索結果。該代碼僅抓取第一個 url 而不是其余的。如何在串列中的所有 url 中成功抓取資料(標題、資訊、描述、應用程式)?
如果問題 1 有效,我如何將資料轉換為 CSV 檔案?
這是代碼:
import requests
from bs4 import BeautifulSoup
from bs4 import BeautifulSoup
import lxml
import pandas as pd
from urllib.request import urlopen
from urllib.error import HTTPError
from urllib.error import URLError
urlList = ["url1","url2","url3"...lists of urls.......]
for url in urlList:
try:
html = urlopen(url)
except HTTPError as e:
print(e)
except URLError:
print("error")
else:
soup = BeautifulSoup(html.read(),"html5lib")
# Scraping
def getTitle():
for title in soup.find('h2', class_="xx").text:
print(title)
def getInfo():
for info in soup.find('ul', class_="j-k-i").text:
print(info)
def getDescription():
for description in soup.find('div', class_="b-d").text:
print(description)
def getApplication():
for application in soup.find('div', class_="g-b bm-b-30").text:
print(application)
for soups in soup():
getTitle()
getInfo()
getDescription()
getApplication()
uj5u.com熱心網友回復:
嘗試以下方法:
import requests
from bs4 import BeautifulSoup
from bs4 import BeautifulSoup
import lxml
import pandas as pd
from urllib.request import urlopen
from urllib.error import HTTPError
from urllib.error import URLError
import csv
def getTitle(soup):
return soup.find('h2', class_="xx").text
def getInfo(soup):
return soup.find('ul', class_="j-k-i").text
def getDescription(soup):
return soup.find('div', class_="b-d").text
def getApplication(soup):
return soup.find('div', class_="g-b bm-b-30").text
urlList = ["url1","url2","url3"...lists of urls.......]
with open('output.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow(['Title', 'Info', 'Desc', 'Application'])
for url in urlList:
try:
html = urlopen(url)
except HTTPError as e:
print(e)
except URLError:
print("error")
else:
soup = BeautifulSoup(html.read(),"html5lib")
row = [getTitle(soup), getInfo(soup), getDescription(soup), getApplication(soup)]
print(row)
csv_output.writerow(row)
這將電流傳遞soup給要使用的每個函式。現在每個函式都回傳找到的文本(以前 for 回圈一次列印一個字符)。
最后,Python 的csv庫可用于輕松撰寫格式正確的 CSV 檔案。它為每一行獲取一個值串列,并在默認情況下將逗號分隔的行寫入output.csv.
注意:未測驗,因為您沒有提供任何合適的 URL。
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/383368.html
