我正在嘗試撰寫一個網路搜刮功能,它可以做幾件事:
我正在嘗試撰寫一個網路搜刮功能。
下面是當前的代碼:
#this is the array of URL's。
urls = ['https://calevip.org/incentive-project/northern-california',
'https://www.slocleanair.org/community/grants/altfuel.php'。
'https://www.mcecleanenergy.org/ev-charging/'。
'https://www.peninsulacleanenergy.com/ev-charging-incentives/'。
'https://www.irs.gov/businesses/plug-in-electric-vehicle-credit-irc-30-and-irc-30d'。
'https://afdc.energy.gov/laws/12309'。
'https://cleanvehiclerebate.org/eng/fleet'。
'https://calevip.org/incentive-project/san-joaquin-valley']
import請求
from bs4 import BeautifulSoup
import sys
from websites import urls
def scrape()。
for x in range (len(urls)):
f = open("test" str(x) " .txt", ' w')
for url in urls:
page = requests.get(url)
#this line of code creates a Beautiful Soup object that takes page.content as input[/span].
soup = BeautifulSoup(page.content, "html.parser")
結果 = (soup.prettify().encode('cp1252', errors='ignore')
#我們需要一個命令,將結果輸入到我們剛剛創建的檔案中。
f.write(str(results))
到目前為止,我能夠讓函式執行步驟1& 2。問題是第一個網站的文本刮擦被放入所有8個.text檔案,而不是第一個網站的文本刮擦被放入第一個.text檔案,第二個網站的文本刮擦被放入第二個檔案,第三個網站的文本刮擦被放入第三個檔案...等等。
我怎樣才能解決這個問題?我覺得我已經很接近了,但是我的第二個 FOR 回圈沒有寫正確。
uj5u.com熱心網友回復:
試著這樣做:-
import requests
from bs4 import BeautifulSoup as BS
urls = ['https://calevip.org/incentive-project/northern-california',
'https://www.slocleanair.org/community/grants/altfuel.php'。
'https://www.mcecleanenergy.org/ev-charging/'。
'https://www.peninsulacleanenergy.com/ev-charging-incentives/'。
'https://www.irs.gov/businesses/plug-in-electric-vehicle-credit-irc-30-and-irc-30d'。
'https://afdc.energy.gov/laws/12309'。
'https://cleanvehiclerebate.org/eng/fleet'。
'https://calevip.org/incentive-project/san-joaquin-valley']
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'。
}
def scrape():
with requests.Session() as session:
i = 1
for url in urls:
try:
page = session.get(url, headers=headers)
page.raise_for_status()
with open(f'test{i}.txt'/span>, 'w') as f:
f.write(BS(page.text, 'lxml').prettify())
i = 1 'lxml'。
except Exception as e:
print(f'Exception while processing {url} -> {e}'/span>)
if __name__ == '__main__'/span>:
scrape()
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/310320.html
標籤:
上一篇:如何使用存盤界面來洗掉一個檔案
下一篇:安卓11存盤訪問和Java檔案庫
