我想瀏覽以下網頁并使用 python 保存相應的影像:
示例(總共 10.000 個網站):
https://cryptopunks.app/cryptopunks/cryptopunk0001.png
https://cryptopunks.app/cryptopunks/cryptopunk0002.png
https://cryptopunks.app/cryptopunks/cryptopunk0002.png
https://cryptopunks.app/cryptopunks/cryptopunk9999.png
我的目標是之后將 GAN 中的影像用于專案作業并通過這樣做來創建影像。
我嘗試將以下代碼改編為上述示例性網站,但不幸的是,我無法使其作業。(回圈瀏覽網頁并下載所有影像):
from bs4 import BeautifulSoup as soup
import requests, contextlib, re, os
@contextlib.contextmanager
def get_images(url:str):
d = soup(requests.get(url).text, 'html.parser')
yield [[i.find('img')['src'], re.findall('(?<=\.)\w $', i.find('img')['alt'])[0]] for i in d.find_all('a') if re.findall('/image/\d ', i['href'])]
n = 3 #end value
os.system('mkdir MARCO_images') #added for automation purposes, folder can be named anything, as long as the proper name is used when saving below
for i in range(n):
with get_images(f'https://marco.ccr.buffalo.edu/images?page={i}&score=Clear') as links:
print(links)
for c, [link, ext] in enumerate(links, 1):
with open(f'MARCO_images/MARCO_img_{i}{c}.{ext}', 'wb') as f:
f.write(requests.get(f'https://marco.ccr.buffalo.edu{link}').content)
有人可以幫我嗎?
非常感謝!
uj5u.com熱心網友回復:
我已經繼續并只使用了請求,作業系統,影像將保存在新檔案夾(或您命名的任何檔案夾)中。然而,下載 9999 個影像是一種相當慢的方法,因此您可以使用執行緒(執行緒來更快地執行函式的呼叫)。
import requests
import os
import threading
os.mkdir("New folder")
def get_images(url, index):
r = requests.get(url)
with open(f"New folder\image_{index}.png", "wb") as img:
img.write(r.content)
img.close()
n = 10000
for i in range(1, n):
t1 = threading.Thread(target=get_images, args=(f"https://cryptopunks.app/cryptopunks/cryptopunk{i}.png", i))
t1.start()
# As you know the website you can easily access it and by just providing the number you can download the images
# the loop will run from 1 to 9999 as you wanted.
至于我的電腦,該程式在不使用執行緒的情況下下載 9 張影像大約需要 7 秒,而使用執行緒下載 9 張影像只需要大約 2 秒。所以使用執行緒可以做多處理。
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/495881.html
上一篇:Kubernetes(GKE)上具有nginx入口的子域
下一篇:“URL安全”是什么意思?
