通過BeautifulSoup回圈瀏覽網頁并下載所有影像-有解無憂

我想瀏覽以下網頁并使用 python 保存相應的影像：

示例（總共 10.000 個網站）：

https://cryptopunks.app/cryptopunks/cryptopunk0001.png
https://cryptopunks.app/cryptopunks/cryptopunk0002.png
https://cryptopunks.app/cryptopunks/cryptopunk0002.png
https://cryptopunks.app/cryptopunks/cryptopunk9999.png

我的目標是之后將 GAN 中的影像用于專案作業并通過這樣做來創建影像。

我嘗試將以下代碼改編為上述示例性網站，但不幸的是，我無法使其作業。（回圈瀏覽網頁并下載所有影像）：

from bs4 import BeautifulSoup as soup
import requests, contextlib, re, os

@contextlib.contextmanager
def get_images(url:str):
  d = soup(requests.get(url).text, 'html.parser') 
  yield [[i.find('img')['src'], re.findall('(?<=\.)\w $', i.find('img')['alt'])[0]] for i in d.find_all('a') if re.findall('/image/\d ', i['href'])]

n = 3 #end value
os.system('mkdir MARCO_images') #added for automation purposes, folder can be named anything, as long as the proper name is used when saving below
for i in range(n):
   with get_images(f'https://marco.ccr.buffalo.edu/images?page={i}&score=Clear') as links:
     print(links)
     for c, [link, ext] in enumerate(links, 1):
        with open(f'MARCO_images/MARCO_img_{i}{c}.{ext}', 'wb') as f:
             f.write(requests.get(f'https://marco.ccr.buffalo.edu{link}').content)

有人可以幫我嗎？

非常感謝！

uj5u.com熱心網友回復：

我已經繼續并只使用了請求，作業系統，影像將保存在新檔案夾（或您命名的任何檔案夾）中。然而，下載 9999 個影像是一種相當慢的方法，因此您可以使用執行緒（執行緒來更快地執行函式的呼叫）。

import requests
import os
import threading

os.mkdir("New folder")


def get_images(url, index):
    r = requests.get(url)

    with open(f"New folder\image_{index}.png", "wb") as img:
        img.write(r.content)
    img.close()


n = 10000
for i in range(1, n):
    t1 = threading.Thread(target=get_images, args=(f"https://cryptopunks.app/cryptopunks/cryptopunk{i}.png", i))
    t1.start() 
    # As you know the website you can easily access it and by just providing the number you can download the images
    # the loop will run from 1 to 9999 as you wanted.

至于我的電腦，該程式在不使用執行緒的情況下下載 9 張影像大約需要 7 秒，而使用執行緒下載 9 張影像只需要大約 2 秒。所以使用執行緒可以做多處理。

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/495881.html

標籤：Python 图片循环网络美丽的汤

上一篇：Kubernetes(GKE)上具有nginx入口的子域

下一篇：“URL安全”是什么意思？