我一直在嘗試自動化此鏈接以使用 selenium 獲取電子郵件地址。我已經使用了這個 XPATH //span[@]/a/@href,它是完美的,但是 selenium 沒有從那里提取值。
我也使用 Regex,但效果不佳 re.findall(r'mailto:(.*?)\?sub', str(driver.page_source))
誰能告訴這里有什么問題?為什么它沒有收到電子郵件,我該如何提取它?
from selenium import webdriver
from scrapy.selector import Selector
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import re
driver = webdriver.Chrome()
driver.get('https://www.ukparks.com/park/haighfield-park/')
WebDriverWait(driver, 7).until(
EC.presence_of_element_located((By.XPATH, '//span[@]'))
)
response = Selector(text=driver.page_source)
email = response.xpath('//span[@]/a/@href').get()
email_re = re.findall(r'mailto:(.*?)\?sub', str(driver.page_source))
print(email)
print(email_re)
uj5u.com熱心網友回復:
您可以嘗試如下使用請求從該站點獲取電子郵件:
import re
import requests
from bs4 import BeautifulSoup
link = 'https://www.ukparks.com/park/haighfield-park/'
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
res = s.get(link)
soup = BeautifulSoup(res.text,"lxml")
item = soup.select_one(".detail-box").get_text(strip=True)
email_raw = re.findall(r"ehArr\.push\('(.*?)'\);",item)
email = ''.join(email_raw[::-1])
print(email)
輸出:
[email protected]
uj5u.com熱心網友回復:
它似乎在 a 標簽上的點擊事件之后填充資料。
wait=WebDriverWait(driver, 10)
driver.get('https://www.ukparks.com/park/haighfield-park/')
wait.until(EC.element_to_be_clickable((By.XPATH, '//span[@]/a'))).click()
link=wait.until(EC.element_to_be_clickable((By.XPATH, '//span[@]/a'))).get_attribute("href")
print(link)
輸出
mailto:[email protected]?subject=Enquiry from UKParks.com
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/320992.html
下一篇:當似乎存在時索引超出范圍
