我正在學習如何使用selenium通過Python從TripAdvisor抓取資料,并想在(https://en.tripadvisor.com.hk/Hotels )的鏈接中按“旅行者排名”排序后提取酒店資訊-g294217-Hong_Kong-Hotels.html)。 html頁面中的酒店名稱和每個酒店的“data-location=”要提取。
["data-location="][1][1]的html代碼:https://i.stack.imgur.com/x668S.png
這是我的代碼。我不知道為什么它不能列印酒店名稱。我也不知道如何在“data-location=”中列出數字。
!pip install selenium
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
browser = webdriver.Chrome(executable_path='C:\ProgramData\Anaconda3\Lib\site-packages\jupyterlab\chromedriver.exe')
browser.get('https://en.tripadvisor.com.hk/Hotels-g294217-Hong_Kong-Hotels.html')
browser.maximize_window()
CheckinDate = browser.find_element(By.XPATH, '//*[@id="BODY_BLOCK_JQUERY_REFLOW"]/div[4]/div[2]/div/div[2]/div/div/div[2]/div/div[2]/div[1]/div[3]/div[3]/div[1]')
CheckinDate.click()
CheckOutDate = browser.find_element(By.XPATH, '//*[@id="BODY_BLOCK_JQUERY_REFLOW"]/div[4]/div[2]/div/div[2]/div/div/div[2]/div/div[2]/div[1]/div[3]/div[3]/div[2]')
CheckOutDate.click()
Roombutton = browser.find_element(By.XPATH, '//*[@id="BODY_BLOCK_JQUERY_REFLOW"]/div[4]/div[2]/div/div[2]/div/div[4]/button')
Roombutton.click()
WebDriverWait(browser, 30).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="component_15"]/div[2]/div[2]/span[1]/div/div'))).click()
browser.find_element(By.XPATH,'//*[@id="component_15"]/div[2]/div[2]/span[1]/div/div[2]/div[1]/div').click()
results = browser.find_elements_by_css_selector('#bodycon_main .prw_meta_hsx_responsive_listing')
for result in results:
try:
link = result.find_element_by_xpath("./div/div[1]/div[2]/div[1]/div/a")
print(link.text)
except:
continue
非常感謝!
uj5u.com熱心網友回復:
您沒有results正確定位變數,它回傳了一個空物件,導致沒有輸出。以下代碼應該可以作業。
代碼片段-
CheckinDate = browser.find_element(By.XPATH, '//*[@id="BODY_BLOCK_JQUERY_REFLOW"]/div[4]/div[2]/div/div[2]/div/div/div[2]/div/div[2]/div[1]/div[3]/div[3]/div[1]')
CheckinDate.click()
CheckOutDate = browser.find_element(By.XPATH, '//*[@id="BODY_BLOCK_JQUERY_REFLOW"]/div[4]/div[2]/div/div[2]/div/div/div[2]/div/div[2]/div[1]/div[3]/div[3]/div[2]')
CheckOutDate.click()
Roombutton = browser.find_element(By.XPATH, '//*[@id="BODY_BLOCK_JQUERY_REFLOW"]/div[4]/div[2]/div/div[2]/div/div[4]/button')
Roombutton.click()
WebDriverWait(browser, 30).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="component_15"]/div[2]/div[2]/span[1]/div/div'))).click()
browser.find_element(By.XPATH,'//*[@id="component_15"]/div[2]/div[2]/span[1]/div/div[2]/div[1]/div').click()
#time sleep to wait for all results to load after applying the preferences
#can be adjusted accordingly
time.sleep(10)
#locate all hotel results
results = browser.find_elements_by_xpath('//div[@]')
#for each hotel in page results
for result in results:
try:
#find hotel name
link = result.find_element_by_xpath('*//div[@]/a')
#find class which contains data-location attribute
data_location=result.find_element_by_xpath('*//div[@]').get_attribute("data-location")
print(link.text)
print(data_location)
except:
continue
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/345995.html
