我正在嘗試從 aliexpress 網站上抓取每個產品頁面,以獲取評論數量、客戶發布的照片??數量以及客戶國家/地區,并將其放入資料框。
我撰寫了一個抓取客戶國家/地區的代碼,但我不知道如何獲取客戶評論的數量和影像的數量。這是我的代碼:
from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
url = 'https://www.aliexpress.com/item/1005003801507855.html?spm=a2g0o.productlist.0.0.1e951bc72xISfE&algo_pvid=6d3ed61e-f378-43d0-a429-5f6cddf3d6ad&algo_exp_id=6d3ed61e-f378-43d0-a429-5f6cddf3d6ad-8&pdp_ext_f={"sku_id":"12000027213624098"}&pdp_pi=-1;40.81;-1;-1@salePrice;MAD;search-mainSearch'
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get(url)
wait = WebDriverWait(driver, 10)
driver.execute_script("arguments[0].scrollIntoView();", wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.tab-content'))))
driver.get(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#product-evaluation'))).get_attribute('src'))
data=[]
while True:
for e in driver.find_elements(By.CSS_SELECTOR, 'div.feedback-item'):
try:
country = e.find_element(By.CSS_SELECTOR, '.user-country > b').text
except:
country = None
data.append({
'country':country,
})
try:
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#complex-pager a.ui-pagination-next'))).click()
except:
break
pd.DataFrame(data).to_csv('filename.csv',index=False)
我將不勝感激您的任何幫助!謝謝 !
uj5u.com熱心網友回復:
如果您想要評論/評論的數量,您可以檢查此部分中的值:
driver.find_element(By.XPATH, 'XPATH_OF_ELEMENT_TO_SCRAP')
要在您的示例中執行此操作,請在回圈之外執行此操作:
number_feedbacks = driver.find_element(By.XPATH, '//*[@id="transction-feedback"]/div[1]')
number_images = driver.find_element(By.XPATH, '//*[@id="transction-feedback"]//label[1]/em')
如果您不了解或不知道此功能,請隨時詢問,我將解釋我在哪里找到這些 XPATH。我們也可以使用 find by id 功能。
在您的代碼中,它將是:
from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
url = 'https://www.aliexpress.com/item/1005003801507855.html?spm=a2g0o.productlist.0.0.1e951bc72xISfE&algo_pvid=6d3ed61e-f378-43d0-a429-5f6cddf3d6ad&algo_exp_id=6d3ed61e-f378-43d0-a429-5f6cddf3d6ad-8&pdp_ext_f={"sku_id":"12000027213624098"}&pdp_pi=-1;40.81;-1;-1@salePrice;MAD;search-mainSearch'
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get(url)
wait = WebDriverWait(driver, 10)
driver.execute_script("arguments[0].scrollIntoView();", wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.tab-content'))))
driver.get(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#product-evaluation'))).get_attribute('src'))
data=[]
number_feedbacks = driver.find_element(By.XPATH, '//*[@id="transction-feedback"]/div[1]')
number_images = driver.find_element(By.XPATH, '//*[@id="transction-feedback"]//label[1]/em')
print(f'number_feedbacks = {number_feedbacks}\nnumber_images = {number_images}')
while True:
for e in driver.find_elements(By.CSS_SELECTOR, 'div.feedback-item'):
try:
country = e.find_element(By.CSS_SELECTOR, '.user-country > b').text
except:
country = None
data.append({
'country':country,
})
try:
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#complex-pager a.ui-pagination-next'))).click()
except:
break
pd.DataFrame(data).to_csv('filename.csv',index=False)
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/453396.html
上一篇:如何根據標題抓取新聞的內容?
