在python中使用src下載影像會產生空影像-有解無憂

我的腳本有點作業，但它保存的檔案是空的。有任何想法嗎？請原諒我在頂部所有未使用的匯入！我嘗試了很多不同的方法來做到這一點。在這里，我正在使用 selenium 拉動 img。然后 SRC 通過回圈迭代并轉換為位元組，以便可以使用 os.path 寫入它們。我懷疑該網站可能正在保護自己免受此類刮擦？

from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
import os
import urllib
import urllib3
import time
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
import requests


driver = webdriver.Firefox()
options = Options()
options.headless = True
driver = webdriver.Firefox(options=options)
driver.get("https://superrare.com/features/the-intersection-of-machine-and-artist")
time.sleep(2)                                                                                                            

#the element with longest height on page
ele=driver.find_element("xpath", '//div[@id="root"]')
total_height = ele.size["height"] 8000
time.sleep(2)  
driver.set_window_size(1920, total_height) 
time.sleep(2)



imgsrc2 = WebDriverWait(driver,50).until(EC.presence_of_all_elements_located((By.XPATH, "//img")))

time.sleep(5)
download_folder = "/Users/rcastong/Desktop/imgs"
if not os.path.exists(download_folder):
    os.makedirs(download_folder)

for i in imgsrc2:
    imgsrc = i.get_attribute("src")
    str_img = str.encode(imgsrc)
    with open(os.path.join(download_folder, os.path.basename(imgsrc)), "wb") as f:
        f.write(str_img)

uj5u.com熱心網友回復：

你忘了用來requests從服務器獲取資料

    response = requests.get(img_src)
    data = response.content
    
    with open(fullname, "wb") as f:
        f.write(data)

最小的作業示例。

它適用于我的一些第一張圖片。也許其他影像需要更長的時間sleep()，或者需要滾動到底部才能src通過 JavaScript 全部加載。

import os
import time
import requests
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

options = Options()
options.headless = True
driver = webdriver.Firefox(options=options)

driver.get("https://superrare.com/features/the-intersection-of-machine-and-artist")
time.sleep(2)                                                                                                            

#the element with longest height on page
root = driver.find_element("xpath", '//div[@id="root"]')
total_height = root.size["height"]   8000
print('total_height:', total_height)
time.sleep(2)

driver.set_window_size(1920, total_height) 
time.sleep(2)

imgs = WebDriverWait(driver, 50).until(EC.presence_of_all_elements_located((By.XPATH, "//img")))
time.sleep(5)

print('len(imgs):', len(imgs))

download_folder = "/Users/rcastong/Desktop/imgs"

# it will create only if not exists
os.makedirs(download_folder, exist_ok=True)

for number, item in enumerate(imgs, 1):
    print('---', number, '---')

    img_src = item.get_attribute("src")
    print('from:', img_src)

    fullname = os.path.join(download_folder, os.path.basename(img_src))
    print('  to:', fullname)
    
    response = requests.get(img_src)
    data = response.content
    
    with open(fullname, "wb") as f:
        f.write(data)

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/420032.html

標籤：

上一篇：Selenium(python)：不檢索下一頁顯示的HTML中的文本

下一篇：如何使用seleiumpython單擊彈出保存按鈕？