對于個人資料科學專案,我想抓取谷歌地圖的資料。我正在使用 python 和 selenium 執行此操作,但我面臨一個奇怪的問題,我只能提取每個頁面的 6 個結果(一個頁面最多可以包含 20 個結果)。
我與你分享我到目前為止所做的:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.expected_conditions import presence_of_element_located
from selenium.common.exceptions import NoSuchElementException
import time
import string
import openpyxl
import os
from selenium.webdriver.common.by import By
# Loading Selenium Webdriver
driver = webdriver.Firefox()
wait = WebDriverWait(driver, 5)
# Opening Google maps
driver.get("https://www.google.com/maps")
time.sleep(3)
#Closing the google consent form
button=driver.find_element(By.XPATH,'/html/body/c-wiz/div/div/div/div[2]/div[1]/div[4]/form/div/div/button')
button.click()
searchbox=driver.find_element(By.ID,'searchboxinput')
location= "Paris"
searchbox.send_keys(location)
searchbox.send_keys(Keys.ENTER)
time.sleep(5)
cancelbut=driver.find_element(By.CLASS_NAME,'gsst_a')
cancelbut.click()
searchbox.send_keys("ASSURANCE")
searchbox.send_keys(Keys.ENTER)
time.sleep(5)
# Locating the results section
while 1==1 :
#Class name of a section
entries = driver.find_elements(By.CLASS_NAME,'lI9IFe')
print(str(entries))
# Prepare the excel file using the Openpyxl
wb = openpyxl.load_workbook("C:/Users/ac/Desktop/plombier.xlsx")
sheet = wb.worksheets[0]
#sheet = wb[sheetname]
#sheet.title = "plombier"
i=0
for entry in entries:
print(entry.text)
print(i)
i =1
# Empty list
labels = []
# Extracting the Name, adress, Phone, and website:
name = entry.find_element(By.CSS_SELECTOR,'.qBF1Pd').text
#adress = entry.find_element(By.XPATH,'/html/body/div[3]/div[9]/div[8]/div/div[1]/div/div/div[4]/div[1]/div[3]/div/div[2]/div[2]/div[1]/div/div/div/div[4]/div[1]/span[2]').text
#phone = entry.find_element(By.XPATH,'/html/body/div[3]/div[9]/div[8]/div/div[1]/div/div/div[4]/div[1]/div[1]/div/div[2]/div[2]/div[1]/div/div/div/div[4]/div[2]/span[3]/jsl/span[2]').text
print(name)
try:
webcontainer = entry.find_element(By.CLASS_NAME,'section-result-action-container')
website = entry.find_element(By.TAG_NAME,'a').get_attribute("href")
except NoSuchElementException:
website = "No website could be found"
print(website)
# Try/except to write the extracted info in the Excel file pass if doessn't exist
try:
sheet.append([location, name, website])
except IndexError:
pass
# saving the excel file
wb.save("C:/Users/ac/Desktop/plombier.xlsx")
time.sleep(4)
pagesuivantebut = driver.find_element(By.ID, 'ppdPk-Ej1Yeb-LgbsSe-tJiF1e')
pagesuivantebut.click()
time.sleep(5)
我是否使用錯誤的類名來查找我的部分結果?
uj5u.com熱心網友回復:
不,你犯了一個錯誤,你最多沒有 20 條記錄,你會看到如果你計算記錄數print(len(entries)),你會有 7 或 9 條記錄
原因很容易理解,如果你向下滾動,谷歌地圖會更新搜索次數,在這種情況下,你將擁有所有記錄(> 20,因為有些企業總是出現在開頭,他們付費排在首位)
希望可以幫助您進行更多調查并更改您的編碼
我做了一個小視頻 2 天可用
下載它以獲得更好的質量:googlemaps probme
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/351734.html
上一篇:水平連接資料幀
