使用Seleniumpython找不到頁面元素-有解無憂

我正在嘗試從此頁面中提取評論文本。

這是我的 chrome 瀏覽器檢查器中顯示的 html 的精簡版本：

<div id="module_product_review" class="pdp-block module">
    <div class="lazyload-wrapper ">
        <div class="pdp-mod-review" data-spm="ratings_reviews" lazada_pdp_review="expose" itemid="1615006548" data-nosnippet="true" data-aplus-ae="x1_490e4591" data-spm-anchor-id="a2o42.pdp_revamp.0.ratings_reviews.508466b1OJjCoH">
            <div>...</div>
            <div>...</div>
            <div>
                <div class="mod-reviews">
                    <div class="item">
                        <div class="top">...</div>
                        <div class="middle">...</div>
                        <div class="item-content">
                            <div class="content" data-spm-anchor-id="a2o42.pdp_revamp.ratings_reviews.i3.508466b1OJjCoH">Slim and light. feel good. better if providing 16G version.</div>
                            <div class="review-image">...></div>
                            <div class="skuInfo">Color Family:MYSTIC SILVER</div>
                            <div class="bottom">...</div>
                            <div class="dialogs"></div>
                        </div>
                        <div class="seller-reply-wrapper">...</div>
                    <div class="item">...</div>
                    <div class="item">...</div>
                    <div class="item">...</div>
                    <div class="item">...</div>
                </div>
            </div>
        </div>
    </div>
</div>

我正在嘗試提取“輕薄。感覺很好。如果提供 16G 版本更好。” 元素中的文本。

但是當我嘗試id="module_product_review"在 python 中使用 Selenium 檢索元素時，這就是我得到的：

<div class="pdp-block module" id="module_product_review">
    <div class="lazyload-wrapper">
        <div class="lazy-load-placeholder">
            <div class="lazy-load-skeleton">
            </div>
        </div>
    </div>
</div>

這是我的代碼：

op = webdriver.ChromeOptions()
op.add_argument('--headless')
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=op)
driver.get("https://www.lazada.sg/products/huawei-matebook-d14-laptop-14-fullview-display-intel-i5-processor-8gb512gb-intel-uhd-graphics-i1615006548-s7594078907.html?spm=a2o42.searchlist.list.3.15064828Od60kh&search=1&freeshipping=1")
module_product_review = driver.find_element(By.ID, "module_product_review")
html = module_product_review.get_attribute("outerHTML")
soup = BeautifulSoup(html, 'lxml')
print(soup.prettify())

我認為這可能是因為我在元素完全加載之前檢索了它，所以我嘗試在呼叫之前讓程式休眠 30 秒find_element()，但我仍然得到相同的結果。據我所知，這也不是 iframe 或影子根的問題。

還有其他我想念的問題嗎？

uj5u.com熱心網友回復：

您嘗試訪問并獲取其文本的元素最初不在可見視圖中。您必須首先將該元素滾動到視圖中。
此外，由于您在無頭模式下作業，您應該設定視窗大小。無頭模式下的默認視窗大小比我們通常使用的要小得多。
并且您應該使用預期條件顯式等待僅在元素準備好時才訪問它們。
這應該會更好：

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains

op = webdriver.ChromeOptions()
op.add_argument('--headless')
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=op)
options.add_argument("window-size=1920,1080")
wait = WebDriverWait(driver, 20)
actions = ActionChains(driver)
driver.get("https://www.lazada.sg/products/huawei-matebook-d14-laptop-14-fullview-display-intel-i5-processor-8gb512gb-intel-uhd-graphics-i1615006548-s7594078907.html?spm=a2o42.searchlist.list.3.15064828Od60kh&search=1&freeshipping=1")
element = wait.until(EC.presence_of_element_located((By.ID, "module_product_review")))
time.sleep(1)
actions.move_to_element(element).perform()
module_product_review = wait.until(EC.visibility_of_element_located((By.ID, "module_product_review")))  
#now you can do what you want here
html = module_product_review.get_attribute("outerHTML")

此外，為了找到該特定元素并獲取該特定文本，您可以使用更精確的內容，如下所示：

your_text = wait.until(EC.visibility_of_element_located((By.XPATH, "(//div[@id='module_product_review']//div[@class='item']//div[@class='content'])[1]"))).text

如上所述，您可以在滾動后使用它

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/433117.html

標籤：Python 硒网页抓取

上一篇：想要動態使用列舉檔案名來獲取常量

下一篇：如何在Linux上使用python使用Selenium上傳檔案（影像）？