Selenium不能刮掉Vivino資訊？-有解無憂

我正在嘗試獲取有關葡萄酒的品酒筆記和食物搭配資訊，Vivino這些資訊無法從他們的 API 中訪問，但在 Python 中NoSuchElementException使用時會得到。Selenium我已經能夠抓取價格和年份資訊，但不能更進一步地抓取資料。

我試圖從 Selenium 不能刮掉 Vivino 資訊？

我嘗試使用WebDriverWait讓頁面加載：

driver.get('https://www.vivino.com/US-TX/en/villa-maria-auckland-private-bin-sauvignon-blanc/w/39034?year=2021&price_id=26743464')
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//div[@data-testid='mentions']")))

我嘗試使用 XPath 來獲取關鍵字柑橘、熱帶、樹果、...：

tasting_notes = driver.find_elements(By.XPATH, "//div[@data-testid='mentions']")

我嘗試使用類名獲取文本本身：

#test = driver.find_elements(By.CLASS_NAME,"tasteNote__flavorGroup--1Uaen")

并不斷獲得NoSuchElementException。是否有其他方法可以訪問資訊，或者 Vivino 是否以某種方式阻止我抓取此部分？

編輯：在嘗試查找資料之前，我嘗試包含滾動到底部的代碼：

    while True:

        # Scroll down to the bottom.
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

        # Wait to load the page.
        time.sleep(2)

        # Calculate new scroll height and compare with last scroll height.
        new_height = driver.execute_script("return document.body.scrollHeight")

        if new_height == last_height:
            break

        last_height = new_height

并且仍然有問題。

編輯：解決了！感謝 furas 的解釋和 Eugeny 的代碼。

uj5u.com熱心網友回復：

正如評論中提到的furas，這個頁面有lazy load所以你需要滾動頁面。但是滾動到底部在這里沒有幫助，因為頁面只加載您正在查看的內容。所以你需要將頁面慢慢滾動到底部。
這是您如何執行此操作的代碼。不確定它是否是最優雅的解決方案，但它有效:)

driver = webdriver.Chrome()
driver.get('https://www.vivino.com/US-TX/en/villa-maria-auckland-private-bin-sauvignon-blanc/w/39034?year=2021&price_id=26743464')
driver.implicitly_wait(10)
page_height = driver.execute_script("return document.body.scrollHeight")
browser_window_height = driver.get_window_size(windowHandle='current')['height']
current_position = driver.execute_script('return window.pageYOffset')
while page_height - current_position > browser_window_height:
    driver.execute_script(f"window.scrollTo({current_position}, {browser_window_height   current_position});")
    current_position = driver.execute_script('return window.pageYOffset')
    sleep(1)  # It is necessary here to give it some time to load the content
print(driver.find_element(By.XPATH, '//div[@data-testid="mentions"]').text)
driver.quit()

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/522193.html

標籤：Python硒网页抓取

上一篇：從這個網頁抓取表格資料

下一篇：將driver.find_element的文本存盤到python變數中