我正在嘗試抓取用戶評論(請參閱下面的免責宣告)。評論按以下分頁組織

我得到了不同編號的元素,然后單擊下一個按鈕 >。頁面確實發生了變化,但沒有填充新資料,看起來像這樣

這是代碼的簡短摘錄:
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
DRIVER_PATH = '***/chromedriver.exe'
driver = webdriver.Chrome(executable_path=DRIVER_PATH) # depreciation, update!
URL = "https://www.kbb.com/mercedes-benz/cla/2018/consumer-reviews/"
driver.get(URL)
time.sleep(5)
button = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//button[@]')))
button.click()
WebDriverWait(driver, 50)
# driver.close()
我該怎么做才能使欄位正確重新加載?我很感激我能得到的所有資訊:-)
免責宣告:這是一個研究專案的第一次測驗,不會出現未經許可的非法抓取或任何資料的濫用!
uj5u.com熱心網友回復:
頁面/資料是動態呈現的。可以通過api獲取資料,對pages引數進行迭代。您也可以只調整每頁的數量并在 1 個請求內獲得它(前提是評論數不超過 100)。
import requests
import pandas as pd
url = 'https://www.kbb.com/ymm/api/'
payload = {
"operationName":"consumerReviewsQuery",
"variables":{
"year":"2018",
"make":"mercedes-benz",
"model":"cla",
"page":1,
"perPage":100,
"bodystyle":"Sedan",
"sort":"1",
"filter":"",
"trendingTopic":""
},
"query":"query consumerReviewsQuery($year: String, $make: String!, $model: String!, $page: Int!, $perPage: Int!, $isInitialLoad: Boolean, $priceType: String, $bodystyle: String, $vehicleId: String, $trim: String, $sort: String, $trendingTopic: String, $filter: String) {\n consumerreviews(\n year: $year\n make: $make\n model: $model\n page: $page\n perPage: $perPage\n isInitialLoad: $isInitialLoad\n priceType: $priceType\n bodystyle: $bodystyle\n vehicleId: $vehicleId\n trim: $trim\n sort: $sort\n trendingTopic: $trendingTopic\n filter: $filter\n ) {\n numPages\n totalReviews\n reviews {\n id\n nickname\n nicknameDisplay\n location\n anonymous\n email\n sessionId\n visitorId\n sessionCount\n friendlyOwnershipStatus\n year\n model\n make\n vehicleId\n title\n reviewText\n ratingOverall\n ratingValue\n ratingReliability\n ratingPerformance\n ratingStyling\n ratingComfort\n ratingQuality\n submissionDate\n positiveLink\n negativeLink\n numPositiveFeedbacks\n numNegativeFeedbacks\n numFeedbacks\n pros\n cons\n areProsOrConsAvailable\n __typename\n }\n searchTerms\n __typename\n }\n}"}
jsonData = requests.post(url, json=payload).json()
reviews = pd.DataFrame(jsonData['data']['consumerreviews']['reviews'])
輸出:
print(reviews)
id nickname ... areProsOrConsAvailable __typename
0 187159459 Love it ... True Reviews
1 179266834 Cremur ... True Reviews
2 176067479 ELSIE ... False Reviews
3 172175820 Noemia ... True Reviews
4 163968274 Pmaze ... True Reviews
5 158405420 Gary ... True Reviews
6 143025966 PMAZE ... True Reviews
7 139966209 Frenchy ... True Reviews
8 139766083 Arizona RN ... True Reviews
9 131870778 GW ... True Reviews
10 120024401 Deekay ... True Reviews
11 119822871 Tony ... True Reviews
12 116958004 MBPDX ... True Reviews
13 115487407 Smitty96 ... True Reviews
14 110965961 chhappy7 ... True Reviews
15 109184667 Tampafun ... True Reviews
16 101289834 Neile ... True Reviews
17 84350718 George ... True Reviews
18 75845132 dav ... True Reviews
19 72639833 Doug ... True Reviews
20 69174734 Carnut ... True Reviews
21 67191860 Mark ... True Reviews
22 65876085 bill ... False Reviews
23 64211472 Lazlow ... True Reviews
24 64008710 psyco ... True Reviews
25 57576670 vars0153 ... False Reviews
26 57574924 Fernando ... False Reviews
27 50932030 anauditor ... True Reviews
28 50346331 Missct1964 ... False Reviews
29 48468674 tekfoc ... True Reviews
30 48003934 BrwnJewel ... False Reviews
31 47955889 Free88 ... True Reviews
32 47726965 Josh ... True Reviews
33 47503009 Derek ... True Reviews
34 44513353 Don Z ... True Reviews
35 43143964 Raquel ... True Reviews
36 43142690 Pajama168 ... True Reviews
37 40484198 JJ ... True Reviews
38 39226477 fox4gib ... True Reviews
39 38915453 Happy in Chicago ... True Reviews
40 38485354 CLA owner ... True Reviews
41 35530044 1st time MB owner ... True Reviews
42 34931432 CC ... True Reviews
43 34151324 First time MB buyer ... True Reviews
44 33259903 tom ... True Reviews
45 32943654 Yash ... True Reviews
46 32472645 TheMarcoIslander ... True Reviews
[47 rows x 33 columns]
uj5u.com熱心網友回復:
我在您的代碼塊中看不到任何此類重大問題。但是,類名本質上ehp7fkv0是動態的,并且每次您重新訪問 Web 應用程式時都會發生變化。一種規范的方法是避免動態值并依賴靜態屬性值。
要在可點擊元素上單擊() ,您需要為element_to_be_clickable()誘導WebDriverWait,您可以使用以下任一定位器策略:
使用CSS_SELECTOR:
driver.get('https://www.kbb.com/mercedes-benz/cla/2018/consumer-reviews/') WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[aria-label='go to previouse page']"))).click()使用XPATH:
driver.get('https://www.kbb.com/mercedes-benz/cla/2018/consumer-reviews/') WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[@aria-label='go to previouse page']"))).click()注意:您必須添加以下匯入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/442317.html
