我想從網站上抓取影像并將其存盤在指定的檔案夾中,但那里的所有教程似乎都只教如何抓取多個影像。例如,我想抓取這張可以立即從https://duckduckgo.com/?q=Puppy&t=h_&ia=web看到的小狗影像并將其保存在我的桌面上。我該怎么做呢?
到目前為止我才想出的代碼是:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
PATH = "C:\Coding\Codes\Python\edgedriver\msedgedriver.exe"
driver = webdriver.Edge(PATH)
driver.maximize_window()
driver.get("https://duckduckgo.com/")
searchbox = driver.find_element_by_id("search_form_input_homepage")
searchbox.send_keys("Puppy")
searchbox.send_keys(Keys.ENTER)
#then save the puppy's image to a specified folder, say inside C:\Users\John\Desktop
uj5u.com熱心網友回復:
要抓取唯一影像的src屬性的值,您可以使用以下任一定位器策略:
使用
css_selector:print(driver.find_element(By.CSS_SELECTOR, "a.module__image>img").get_attribute("src"))使用
xpath:print(driver.find_element(By.XPATH, "//a[@class='module__image']/img").get_attribute("src"))
理想情況下,你需要引起WebDriverWait的visibility_of_element_located(),你可以使用以下的定位策略:
使用
CSS_SELECTOR:print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "a.module__image>img"))).get_attribute("src"))使用
XPATH:print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//a[@class='module__image']/img"))).get_attribute("src"))控制臺輸出:
https://duckduckgo.com/i/a49fa21e.jpg注意:您必須添加以下匯入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
uj5u.com熱心網友回復:
您可以使用 urllib.request 庫
import urllib.request
from random import *
import random,string
sampleImage = driver.find_element_by_xpath('your xpath').get_attribute('src')
characters = 5
letters = string.ascii_lowercase
img_str = ''.join(random.choice(letters) for i in range(characters))
fullname = str(img_str) '.jpg'
filepath = 'E:\\crawling\\IMG\\' fullname
urllib.request.urlretrieve(sampleImage,filepath)
print(fullname)
我希望這會成功。我使用隨機庫用隨機字符命名影像。
如果你想回圈影像,這里是代碼
import urllib.request
from random import *
import random,string
j=1
imagename=[]
for images in driver.find_elements_by_xpath('//*[@id="w0"]/div[1]/div/div/div/div/div/div/div[1]/table/tbody/tr'):
sampleImage[j] = driver.find_element_by_xpath('//*[@id="w0"]/div[1]/div/div/div/div/div/div/div[1]/table/tbody/tr[%d]/td[1]/img' % (j,)).get_attribute('src')
print(sampleImage[j])
characters = 10
letters = string.ascii_lowercase
img_str = ''.join(random.choice(letters) for i in range(characters))
fullname[j] = str(img_str) '.jpg'
filepath[j] = 'E:\\crawling\\IMG-FARAH\\' fullname[j]
urllib.request.urlretrieve(sampleImage[j],filepath[j])
imagename.append(fullname[j])
print(fullname[j])
j=j 1
我還添加了示例 xpath 和變數,它們會在每次計數后更新
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/353650.html
