網路搜刮：需要幫助最后一個帖子和找到鏈接 -有解無憂

首先，對不起，我的英語不好。實際上，我有一個腳本，它可以用python來搜索網站，找到網頁上的評論。它是用來搜刮網頁上的所有資訊的，但我只想搜刮最后一個帖子。如何做到這一點呢？此外，我還想找到可能在最后一條資訊中發布的網路鏈接，但要有完整的鏈接。這可能嗎？

https://www.dealabs.com/discussions/suivi-erreurs-de-prix-1063390?page=9999

#!/usr/bin/env python3。
# https://www.jeuxvideo.com/forums/42-47-66784467-1-0-1-0-aide-scraping-python-forum-dealabs.htm
# scraping_dealabs.py

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By 

url = "https://www.dealabs.com/discussions/suivi-erreurs-de-prix-1063390?page=9999"/span>

options = Options()
options.headless=True

driver = webdriver.Chrome(options=options)
driver.get(url)

# Accepter les cookies
button = WebDriverWait(driver, 2).until(
    EC.element_to_be_clickable((By.XPATH, "/html/body/main/div[4]/div[1]/div[1]/button[2]/span"/span>)
)
button.click()

# On recherche les commentaires et on affiche le texte >。
comments = driver.find_elements_by_class_name("commentList-item"/span>)

for comment in comments:
    _id = comment.get_attribute("id")
    author = comment.find_element_by_class_name('userInfo-username').text
    content = comment.find_element_by_class_name('userHtml-content').text
    timeestamp = comment.find_element_by_class_name('text-color-greyShade').text
    comment_url = f"{url}#{_id}"

    print("posté par", author)
    print(content)
    print("Publication:", Timestamp)
    print("Lien du commentaire:")。
    print(comment_url)
    print('-'/span> * 30)

driver.close()

謝謝你的時間回答！

uj5u.com熱心網友回復：

首先我希望你使用正確的定位器，所以不要使用/html/body/main/div[4]/div[1]/div[1]/div[2]/button[2]/span嘗試使用這個CSS選擇器.btn--mode-primary.overflow--wrap-on。為了獲得最后的評論，你可以使用這個XPath。(//div[@class='commentList-item'])[last()] 因此，為了獲得最后一條評論的詳細資訊，你的代碼可以修改成這樣：

#!/usr/bin/env python3。 # https://www.jeuxvideo.com/forums/42-47-66784467-1-0-1-0-aide-scraping-python-forum-dealabs.htm # scraping_dealabs.py from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.chrome.options import Options from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.common.action_chains import ActionChains url = "https://www.dealabs.com/discussions/suivi-erreurs-de-prix-1063390?page=9999"/span> options = Options() options.headless=True driver = webdriver.Chrome(options=options) driver.get(url) actions = ActionChains(driver) # Accepter les cookies。 WebDriverWait(driver, 2).until( EC.element_to_be_clickable((By.CSS_SELECTOR, ".btn--mode-primary.overflow-wrap-on")).click() last_comment = driver.find_element_by_xpath("(//div[@class='commentList-item']) [last()]") actions.move_to_element(last_comment).perform() time.sleep(0.5) last_comment = driver.find_element_by_xpath("(//div[@class='commentList-item']) [last()]") _id = last_comment.get_attribute("id"/span>) author = last_comment.find_element_by_xpath(".//span[包含(@class,'userInfo-username')]").text content = last_comment.find_element_by_xpath(".//*[contains(@class,'userHtml-content')]").text timestamp = last_comment.find_element_by_xpath(".//*[contains(@class,'text-color-greyShade')]").text comment_url = f"{url}#{_id}"/span> print("posté par", author) print(content) print("Publication:", Timestamp) print("Lien du commentaire:")。 print(comment_url) print('-'/span> * 30) driver.close()

UPD
為了獲得頁面上的最后一個元素，正如你在評論中所描述的那樣，你必須將定位器從

改成

。

last_comment = driver.find_element_by_xpath("(//div[@class='commentList-item']) [last()]"/span>)

last_comment = driver.find_element_by_xpath("（//div[@class='commentList-comment']）[last()]")

所以上面的整個代碼將是：

#!/usr/bin/env python3。
# https://www.jeuxvideo.com/forums/42-47-66784467-1-0-1-0-aide-scraping-python-forum-dealabs.htm
# scraping_dealabs.py

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By 
from selenium.webdriver.common.action_chains import ActionChains

url = "https://www.dealabs.com/discussions/suivi-erreurs-de-prix-1063390?page=9999"/span>

options = Options()
options.headless=True

driver = webdriver.Chrome(options=options)
driver.get(url)

actions = ActionChains(driver)

# Accepter les cookies。
WebDriverWait(driver, 2).until(
EC.element_to_be_clickable((By.CSS_SELECTOR, ".btn--mode-primary.overflow-wrap-on")).click()

last_comment = driver.find_element_by_xpath("(//div[@class='commentList-comment']) [last()]")

actions.move_to_element(last_comment).perform()
time.sleep(0.5)
last_comment = driver.find_element_by_xpath("(//div[@class='commentList-comment']) [last()]")

_id = last_comment.get_attribute("id")
author = last_comment.find_element_by_xpath(".//span[包含(@class,'userInfo-username')]").text
content = last_comment.find_element_by_xpath(".//*[contains(@class,'userHtml-content')]").text
timestamp = last_comment.find_element_by_xpath(".//*[contains(@class,'text-color-greyShade')]").text
comment_url = f"{url}#{_id}"/span>

print("posté par", author)
print(content)
print("Publication:", Timestamp)
print("Lien du commentaire:")。
print(comment_url)
print('-'/span> * 30)

driver.close()

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/318359.html

標籤：

上一篇：使用和運算子來處理常量的訂單型別值

下一篇：如何在Selenium中識別一個元素