通過更改Xpaths抓取資料-有解無憂

我不知道如何抓取資料，我正在嘗試從網站上抓取產品名稱、價格和其他資訊，產品名稱很容易訪問，因為它們具有相似的 xpath，只有一個標簽會發生變化，但價格會發生變化標簽有多項更改。是否有替代方法可以在沒有 xpath 的情況下抓取資料，因為類名和 ID 回傳一個空字串。


driver= webdriver.Chrome('E:/chromedriver/chromedriver.exe')
product_name=[]
product_price=[]
product_rating=[]
product_url=[]
driver.get('https://www.cdiscount.com/bricolage/climatisation/traitement-de-l-air/ioniseur/l-166130303.html#_his_')
for i in range(1,55):
    try :
        productname=driver.find_element('xpath','//*[@id="lpBloc"]/li[' str(i) ']/a/div[2]/div/span').text
        product_name.append(productname)
    except:
        print("none")
print(product_name)'''


Xpath of the price:

1st items price
```//*[@id="lpBloc"]/li[1]/div[2]/div[3]/div[1]/div/div[2]/span[1]```

2nd items price
'''//*[@id="lpBloc"]/li[2]/div[2]/div[2]/div[1]/div/div[2]/span[1]'''

uj5u.com熱心網友回復：

您無需使用硬編碼回圈，而是識別唯一的 xpath 來識別父元素，然后是子元素。只有評級不適用于您可以使用try..except塊的每個產品。

product_name=[]
product_price=[]
product_rating=[]
product_url=[]
driver.get('https://www.cdiscount.com/bricolage/climatisation/traitement-de-l-air/ioniseur/l-166130303.html#_his_')
for item in driver.find_elements(By.XPATH,'//*[@id="lpBloc"]//li[@data-sku]'):
    
        productname=item.find_element('xpath','.//span[@]').text
        product_name.append(productname)
        productprice=item.find_element('xpath','.//span[@]').text
        product_price.append(productprice)
        try:
          productRating=item.find_element('xpath','.//span[@]//span[@]').text
          product_rating.append(productRating)
        except:
          productRating="Nan"
          product_rating.append(productRating)
          
        productUrl=item.find_element('xpath','.//a[.//span[@]]').get_attribute("href")
        product_url.append(productUrl)
        
print(product_name)
print(product_price)
print(product_rating)
print(product_url)

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/527421.html

標籤：Python硒网页抓取路径美丽的汤

上一篇：如何廢棄產品的href資訊，但前提是該產品有庫存？

下一篇：從r中的網站抓取標題的問題