我的腳本有點作業,但它保存的檔案是空的。有任何想法嗎?請原諒我在頂部所有未使用的匯入!我嘗試了很多不同的方法來做到這一點。在這里,我正在使用 selenium 拉動 img。然后 SRC 通過回圈迭代并轉換為位元組,以便可以使用 os.path 寫入它們。我懷疑該網站可能正在保護自己免受此類刮擦?
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
import os
import urllib
import urllib3
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
import requests
driver = webdriver.Firefox()
options = Options()
options.headless = True
driver = webdriver.Firefox(options=options)
driver.get("https://superrare.com/features/the-intersection-of-machine-and-artist")
time.sleep(2)
#the element with longest height on page
ele=driver.find_element("xpath", '//div[@id="root"]')
total_height = ele.size["height"] 8000
time.sleep(2)
driver.set_window_size(1920, total_height)
time.sleep(2)
imgsrc2 = WebDriverWait(driver,50).until(EC.presence_of_all_elements_located((By.XPATH, "//img")))
time.sleep(5)
download_folder = "/Users/rcastong/Desktop/imgs"
if not os.path.exists(download_folder):
os.makedirs(download_folder)
for i in imgsrc2:
imgsrc = i.get_attribute("src")
str_img = str.encode(imgsrc)
with open(os.path.join(download_folder, os.path.basename(imgsrc)), "wb") as f:
f.write(str_img)
uj5u.com熱心網友回復:
你忘了用來requests從服務器獲取資料
response = requests.get(img_src)
data = response.content
with open(fullname, "wb") as f:
f.write(data)
最小的作業示例。
它適用于我的一些第一張圖片。也許其他影像需要更長的時間sleep(),或者需要滾動到底部才能src通過 JavaScript 全部加載。
import os
import time
import requests
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
options = Options()
options.headless = True
driver = webdriver.Firefox(options=options)
driver.get("https://superrare.com/features/the-intersection-of-machine-and-artist")
time.sleep(2)
#the element with longest height on page
root = driver.find_element("xpath", '//div[@id="root"]')
total_height = root.size["height"] 8000
print('total_height:', total_height)
time.sleep(2)
driver.set_window_size(1920, total_height)
time.sleep(2)
imgs = WebDriverWait(driver, 50).until(EC.presence_of_all_elements_located((By.XPATH, "//img")))
time.sleep(5)
print('len(imgs):', len(imgs))
download_folder = "/Users/rcastong/Desktop/imgs"
# it will create only if not exists
os.makedirs(download_folder, exist_ok=True)
for number, item in enumerate(imgs, 1):
print('---', number, '---')
img_src = item.get_attribute("src")
print('from:', img_src)
fullname = os.path.join(download_folder, os.path.basename(img_src))
print(' to:', fullname)
response = requests.get(img_src)
data = response.content
with open(fullname, "wb") as f:
f.write(data)
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/420032.html
標籤:
