抓取網站時如何移動到下一個封閉（div）？-有解無憂

中的所有資料都是從第一個表中填充的。我無法移動到下一個 div 并獲取tdfor each的資料tr。

網站：https://asd.com/page/

下面是我寫的代碼。

from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))
my_url= "https://asd.com/page/asd"
driver.get(my_url)
boxes = driver.find_elements(By.CLASS_NAME, "col-md-4")

companies = []
company = {}
for box in boxes:
    header = box.find_element(By.CLASS_NAME,"text-primary.text-uppercase")
    company['name']= header.text
    td= box
    company['Type']= td.find_element(By.XPATH,"//div/div/div/table/tbody/tr[1]/td").text
    company['Capital']= td.find_element(By.XPATH,"//div/div/div/table/tbody/tr[2]/td").text
    company['Address'] = td.find_element(By.XPATH,"//div/div/div/table/tbody/tr[3]/td").text
    company['Owner'] = td.find_element(By.XPATH,"//div/div/div/table/tbody/tr[4]/td").text
    company['Co-Owner'] = td.find_element(By.XPATH,"//div/div/div/table/tbody/tr[5]/td").text
    company['Duration'] = td.find_element(By.XPATH,"//div/div/div/table/tbody/tr[6]/td").text
    company['Place'] = td.find_element(By.XPATH,"//div/div/div/table/tbody/tr[7]/td").text
    company['Company ID'] = td.find_element(By.XPATH,"//div/div/div/table/tbody/tr[8]/td").text

    companies.append(company)

    print(company)

uj5u.com熱心網友回復：

這里有幾個問題：

在獲取所有元素的串列之前，您需要在driver.get(my_url)和之間添加一些延遲boxes = driver.find_elements(By.CLASS_NAME, "col-md-4")以讓元素加載。
text-primary.text-uppercase實際上是 2 個類名：text-primary因此text-uppercase您應該使用 XPATH 或 CSS_SELECTOR 通過 2 個類名而不是CLASS_NAME.
為了在另一個元素中定位元素，您應該使用以點開頭的 XPATH.
您喜歡的定位器//div/div/div/table/tbody/tr[1]/td是絕對的，而它們應該根據父box元素計算。
無需定義td元素，您可以在此處使用現有box元素。
像這樣的定位器//div/div/div/table/tbody/tr[1]/td可以而且應該改進。
您可能需要在迭代它們時滾動到框。
我認為company = {}應該在回圈內定義。
這應該會更好：

from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
import time
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))
my_url= "https://monentreprise.bj/page/annonces"
driver.get(my_url)
wait = WebDriverWait(driver, 20)
actions = ActionChains(driver)
wait.until(EC.presence_of_element_located((By.CLASS_NAME, "col-md-4")))
time.sleep(2)
boxes = driver.find_elements(By.CLASS_NAME, "col-md-4")

companies = []
for box in boxes:
    actions.move_to_element(box).perform()
    time.sleep(0.3)
    company = {}
    header = box.find_element(By.XPATH,".//h5[@class='text-primary text-uppercase']")
    company['name']= header.text
    company['Objet']= box.find_element(By.XPATH,".//tr[1]/td").text
    company['Capital']= box.find_element(By.XPATH,".//tr[2]/td").text
    company['Siège Social'] = box.find_element(By.XPATH,".//tr[3]/td").text
    company['Gérant'] = box.find_element(By.XPATH,".//tr[4]/td").text
    company['Co-Gérant'] = box.find_element(By.XPATH,".//tr[5]/td").text
    company['Durée'] = box.find_element(By.XPATH,".//tr[6]/td").text
    company['Dép?t'] = box.find_element(By.XPATH,".//tr[7]/td").text
    company['Immatriculation RCCM'] = box.find_element(By.XPATH,".//tr[8]/td").text

    companies.append(company)

    print(company)

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/444115.html

標籤：Python 硒硒网络驱动程序网页抓取路径

上一篇：Python在selenium中下載更新的源頁面

下一篇：find_elements問題第3部分中的Pythonfind_element？