我的網路搜刮器的嵌套FOR回圈作業了一半（美麗的湯）。 -有解無憂

我正在嘗試撰寫一個網路搜刮功能，它可以做幾件事：

我正在嘗試撰寫一個網路搜刮功能。

根據一個URL的串列確定要抓取的URL的數量

為每個URL創建一個單獨的檔案

為每個URL創建一個單獨的檔案。

從每個URL中抓取文本

。

將每個文本搜刮的結果插入剛剛創建的指定檔案中

。

下面是當前的代碼：

#this is the array of URL's。

urls = ['https://calevip.org/incentive-project/northern-california',
        'https://www.slocleanair.org/community/grants/altfuel.php'。
        'https://www.mcecleanenergy.org/ev-charging/'。
        'https://www.peninsulacleanenergy.com/ev-charging-incentives/'。
        'https://www.irs.gov/businesses/plug-in-electric-vehicle-credit-irc-30-and-irc-30d'。
        'https://afdc.energy.gov/laws/12309'。
        'https://cleanvehiclerebate.org/eng/fleet'。
        'https://calevip.org/incentive-project/san-joaquin-valley']

import請求
from bs4 import BeautifulSoup
import sys
from websites import urls

def scrape()。
    for x in range (len（urls）):
        f = open("test" str(x) " .txt", ' w')
        for url in urls:
            page = requests.get(url)
            #this line of code creates a Beautiful Soup object that takes page.content as input[/span].
            soup = BeautifulSoup(page.content, "html.parser") 
            結果 = (soup.prettify().encode('cp1252', errors='ignore')
            #我們需要一個命令，將結果輸入到我們剛剛創建的檔案中。
            f.write(str（results）)

到目前為止，我能夠讓函式執行步驟1& 2。問題是第一個網站的文本刮擦被放入所有8個.text檔案，而不是第一個網站的文本刮擦被放入第一個.text檔案，第二個網站的文本刮擦被放入第二個檔案，第三個網站的文本刮擦被放入第三個檔案...等等。

我怎樣才能解決這個問題？我覺得我已經很接近了，但是我的第二個 FOR 回圈沒有寫正確。

uj5u.com熱心網友回復：

試著這樣做：-

import requests
from bs4 import BeautifulSoup as BS


urls = ['https://calevip.org/incentive-project/northern-california',
        'https://www.slocleanair.org/community/grants/altfuel.php'。
        'https://www.mcecleanenergy.org/ev-charging/'。
        'https://www.peninsulacleanenergy.com/ev-charging-incentives/'。
        'https://www.irs.gov/businesses/plug-in-electric-vehicle-credit-irc-30-and-irc-30d'。
        'https://afdc.energy.gov/laws/12309'。
        'https://cleanvehiclerebate.org/eng/fleet'。
        'https://calevip.org/incentive-project/san-joaquin-valley']
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'。
}

def scrape():
    with requests.Session() as session:
        i = 1
        for url in urls:
            try:
                page = session.get(url, headers=headers)
                page.raise_for_status()
                with open(f'test{i}.txt'/span>, 'w') as f:
                    f.write(BS(page.text, 'lxml').prettify())
                    i  = 1 'lxml'。
            except Exception as e:
                print(f'Exception while processing {url} -> {e}'/span>)

if __name__ == '__main__'/span>:
    scrape()

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/310320.html

標籤：

上一篇：如何使用存盤界面來洗掉一個檔案

下一篇：安卓11存盤訪問和Java檔案庫