在 Python 中使用 selenium,我已經能夠成功訪問我想要下載的影像的一些 url。但是,影像鏈接存盤在 srcset 影像屬性中。當我使用 get_attribute('srcset') 時,它回傳一個帶有 4 個鏈接的字串。我只想要那個。我該怎么做呢?之后我可以剪掉字串嗎?
這是我從中抓取的網站:
https://www.politicsanddesign.com/
這是我的代碼:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver import ActionChains
import pyautogui
import time
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(ChromeDriverManager().install(), options = chrome_options)
driver.get('https://www.politicsanddesign.com/')
img_url = driver.find_element(By.XPATH, "//div[@class = 'responsive-image-wrapper']/img").get_attribute("srcset")
driver.get(img_url)
這是 img_url 物件的樣子:
//images.ctfassets.net/00vgtve3ank7/6f38yjnNcU1d6dw0jt1Uhk/70dfbf208b22f7b1c08b7421f910bb36/2020_HOUSE_VA-04_D-MCEACHIN..jpg?w=400&fm=jpg&q=80 400w, //images.ctfassets.net/00vgtve3ank7/6f38yjnNcU1d6dw0jt1Uhk/70dfbf208b22f7b1c08b7421f910bb36/2020_HOUSE_VA-04_D-MCEACHIN..jpg?w=800&fm=jpg&q=80 800w, //images.ctfassets.net/00vgtve3ank7/6f38yjnNcU1d6dw0jt1Uhk/70dfbf208b22f7b1c08b7421f910bb36/2020_HOUSE_VA-04_D-MCEACHIN..jpg?w=1200&fm=jpg&q=80 1200w, //images.ctfassets.net/00vgtve3ank7/6f38yjnNcU1d6dw0jt1Uhk/70dfbf208b22f7b1c08b7421f910bb36/2020_HOUSE_VA-04_D-MCEACHIN..jpg?w=1800&fm=jpg&q=80 1800w
但我希望它只是:
//images.ctfassets.net/00vgtve3ank7/6f38yjnNcU1d6dw0jt1Uhk/70dfbf208b22f7b1c08b7421f910bb36/2020_HOUSE_VA-04_D-MCEACHIN..jpg?w=400&fm=jpg&q=80
uj5u.com熱心網友回復:
我的低效解決方案:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver import ActionChains
import pyautogui
import time
# WILL NEED TO EVENTUALLY FIGURE OUT HOW TO WRAP ALL OF THIS INTO A FUNCTION OR LOOP TO DO IT FOR ALL DIV OBJECTS
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(ChromeDriverManager().install(), options = chrome_options)
driver.get('https://www.politicsanddesign.com/')
img_url = driver.find_element(By.XPATH, "//div[@class = 'responsive-image-wrapper']/img").get_attribute("srcset")
driver.get(img_url)
img_url2 = 'https:' img_url.split(' 400w',1)[0]
driver.get(img_url2)
uj5u.com熱心網友回復:
該影像似乎有一個名為currentSrc的屬性,它只保存當前值。
img_url = driver.find_element(By.XPATH, "//div[@class = 'responsive-image-wrapper']/img").get_attribute("currentSrc")
driver.get(img_url)
uj5u.com熱心網友回復:
您可以簡單地拆分從該 Web 元素中提取的值。
如下:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver import ActionChains
import pyautogui
import time
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(ChromeDriverManager().install(), options = chrome_options)
driver.get('https://www.politicsanddesign.com/')
img_url = driver.find_element(By.XPATH, "//div[@class = 'responsive-image-wrapper']/img").get_attribute("srcset")
img_urls = img_url.split(",")
現在img_urls是一個包含 3 個 URL的串列,因此您可以按以下方式使用它:
driver.get(img_urls[0]) #open the first URL
driver.get(img_urls[1]) #open the second URL
driver.get(img_urls[2]) #open the third URL
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/526059.html
上一篇:掃地機器人地圖與用戶終端的同步
