我正在使用 selenium 抓取網頁以獲取產品型號。該頁面有兩個部分的產品網格,兩個部分之間有一張卡片。我可以從“browse-search-pods-1”的第一部分獲取型號,但我無法從“browse-search-pods-2”之后的第二部分訪問頁面下半部分的元素。它忽略了第二部分。有 24 個產品,但它只從第一部分中獲取前 12 個。如何訪問這兩個部分?
這是網站: https ://www.homedepot.com/b/Building-Materials-Drywall/N-5yc1vZar3d?catStyle=ShowProducts
這是一個產品的 html 示例:
<div class="grid">
<section id="browse-search-pods-1" class="grid">
<div class="browse-search__pod col__true-12 col__6-12--xs col__4-12--sm col__3-12--md col__3-12--lg" data-lg-name="Product Pod: 0">
<div class="desktop product-pod" data-automation-id="podnode" data-type="product">
<div class="product-pod--padding">
<a href="/p/USG-Sheetrock-Brand-1-2-in-x-4-ft-x-8-ft-UltraLight-Drywall-14113411708/202530243" target="_blank" rel="noopener noreferrer" class="super-sku__inline-swatch__mini-swatch__more-options">More Options</a>
<div class="product-pod__title product-pod__title__product">
<a href="/p/USG-Sheetrock-Brand-1-2-in-x-4-ft-x-8-ft-UltraLight-Drywall-14113411708/202530243" class="header product-pod--ie-fix">
<div class="product-pod--ie-fix product-pod__title-control">
<h2 class="product-pod__title product-pod__title__product"><span class="product-pod__title__brand--bold">USG Sheetrock Brand</span><span class="product-pod__title__product">1/2 in. x 4 ft. x 8 ft. UltraLight Drywall</span></h2>
</div>
</a>
</div>
<div class="ratings-and-model-number-container">
<div class="product-pod-list__identifiers">
<div class="product-identifier product-identifier__model">Model# 14113411708</div>
</div>
<a href="/p/USG-Sheetrock-Brand-1-2-in-x-4-ft-x-8-ft-UltraLight-Drywall-14113411708/202530243#ratings-and-reviews" data-testid="product-pod__ratings-link">
<div class="ratings--6r7g3">
<div class="reviews--c43xm reviews--no-margin--c43xm" title=""><span class="stars--c43xm" style="width:89.80600000000001%"></span></div>
<span class="ratings__count--6r7g3">
(<!-- -->3753<!-- -->)
</span>
</div>
</a>
</div>
</div>
</div>
</div>
</section>
<section id="browse-search-pods-2" class="grid">
<div class="category-cards col__12-12" data-lg-name="Product Pod: 0">
<div class="category-cards__zone-wrapper category-cards__zone-card">
<section class="zone-card__zone1">
<div class="zone-card__header-wrapper">
<h2 class="zone-card__header u__bold">Project Guide</h2>
<p class="zone-card__header-text">Installing Drywall Project Guide</p>
</div>
<div class="zone-card-details">
<div class="zone-card-details__image"><img src="https://www.homedepot.com/hdus/en_US/DTCCOMNEW/fetch/FetchRules/FetchPN/how-to-install-drywall-professional-steps-HT-PG-BM.jpg" alt="" class="stretchy" height="1" width="1" loading="lazy"></div>
<div class="zone-card-details__description">
<div class="zone-card-details__text category-cards-details__text--truncate">Hanging drywall is not difficult if you have patience, the right tools and a friend to help. Follow our instructions to learn more</div>
<div class="zone-card-details__actions"><a class="bttn-outline bttn-outline--primary bttn--inline zone-card-details__btn" href="//www.homedepot.com/c/how_to_install_drywall_professional_steps_HT_PG_BM"><span class="bttn__content">Read Our Guide</span></a></div>
</div>
</div>
</section>
<section class="zone-card__zone2">
<div class="zone-card__header-wrapper">
<h2 class="u__truncate zone-card__header u__bold">Buying Guide</h2>
<p class="zone-card__header-text">Types of Drywall</p>
</div>
<div class="zone-card__video-wrapper">
<a class="zone-card__vidcap-link" href="//www.homedepot.com/c/ab/types-of-drywall/9ba683603be9fa5395fab90c24feaae">
<div class="zone-card-details__image zone-card-details__image--vidcap" style="background-image: url("https://i3.ytimg.com/vi/4hF9_z3IqaA/mqdefault.jpg");"></div>
</a>
</div>
<a class="zone-card__video-link" href="//www.homedepot.com/c/ab/types-of-drywall/9ba683603be9fa5395fab90c24feaae">See Our Tips</a>
</section>
</div>
</div>
<div class="browse-search__pod col__true-12 col__6-12--xs col__4-12--sm col__3-12--md col__3-12--lg">
<div class="desktop product-pod" data-automation-id="podnode" data-type="product">
<div class="product-pod--padding">
<a href="/p/Westpac-Materials-18-lb-Fast-Set-20-Lite-Setting-Type-Joint-Compound-22165H/100320411" target="_blank" rel="noopener noreferrer" class="super-sku__inline-swatch__mini-swatch__more-options">More Options</a>
<div class="product-pod__title product-pod__title__product">
<a href="/p/Westpac-Materials-18-lb-Fast-Set-20-Lite-Setting-Type-Joint-Compound-22165H/100320411" class="header product-pod--ie-fix">
<div class="product-pod--ie-fix product-pod__title-control">
<h2 class="product-pod__title product-pod__title__product"><span class="product-pod__title__brand--bold">Westpac Materials</span><span class="product-pod__title__product">18 lb. Fast Set 20 Lite Setting-Type Joint Compound</span></h2>
</div>
</a>
</div>
<div class="ratings-and-model-number-container">
<div class="product-pod-list__identifiers">
<div class="product-identifier product-identifier__model">Model# 22165H</div>
</div>
<a href="/p/Westpac-Materials-18-lb-Fast-Set-20-Lite-Setting-Type-Joint-Compound-22165H/100320411#ratings-and-reviews" data-testid="product-pod__ratings-link">
<div class="ratings--6r7g3">
<div class="reviews--c43xm reviews--no-margin--c43xm" title=""><span class="stars--c43xm" style="width: 94.16%;"></span></div>
<span class="ratings__count--6r7g3">(226)</span>
</div>
</a>
</div>
</div>
</div>
</div>
</section>
</div>
這是我嘗試訪問第二部分的代碼,但我從第一部分獲得了型號:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
options = Options()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
driver.get('https://www.homedepot.com/b/Building-Materials-Drywall/N-5yc1vZar3d?catStyle=ShowProducts')
section_two = driver.find_element(By.XPATH, "//section[contains(@id, 'browse-search-pods-2')]")
product_model = section_two.find_elements(By.XPATH, "//div[contains(@class, 'product-identifier product-identifier__model')]")
for model in product_model:
print(model.text)
uj5u.com熱心網友回復:
嘗試滾動到元素browse-search-pods-2,然后執行
section_two = driver.find_element(By.XPATH, "//section[contains(@id, 'browse-search-pods-2')]")
對于滾動,您可以嘗試:
org.openqa.selenium.interactions.Actions體現在ActionChains類:
from selenium.webdriver.common.action_chains import ActionChains
element = driver.find_element(By.XPATH, "//section[contains(@id, 'browse-search-pods-2')]")
actions = ActionChains(driver)
actions.move_to_element(element).perform()
或者,您也可以通過以下方式“滾動查看” scrollIntoView():
driver.execute_script("arguments[0].scrollIntoView();", element)
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/436836.html
標籤:python-3.x 硒 网页抓取
上一篇:靜態類特征的條件執行路徑
