請在下面找到代碼
<div class="xsmall-12 medium-6 columns">
<label class="device-attribute-label" rel="tooltip" title="Proprietary/trade name of the medical device as used in the labeling or catalog.">Brand Name:</label>
Snowden-Pencer
<br>
<label class="device-attribute-label" rel="tooltip" title="Identifies a category or design of devices that have specifications, performance, size, and composition within limits set by the company.">Version or Model:</label>
32-0044
<br>
<label class="device-attribute-label" href="#" rel="tooltip" title="Whether the device is currently offered for sale by the device company. A device no longer in commercial distribution may or may not still be available for purchase in the marketplace.">Commercial Distribution Status:</label>
In Commercial Distribution
<br>
<label class="device-attribute">Catalog Number:</label>
32-0044
<br>
<label class="device-attribute">Company Name:</label>
CAREFUSION 2200, INC
</div>
我必須單獨檢索父文本,例如“Snowden-Pencer”、“32-0044”、“CAREFUSION 2200, INC”
這是我嘗試過的
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, "/html/body/div[1]/div[1]/div[2]/div[1]/div[2]/section[1]/div[1]/div[1]//div[1]/*"))
)
輸出是:
output = Brand Name:
我需要找到正確的 XPath。請幫幫我,提前謝謝
uj5u.com熱心網友回復:
如果您使用的是 selenium,則可以使用該getText()方法,因此它可能看起來像driver.findElement(webdriver.By.xpath("XPath")).getText()用于 Java 的 Altought。
對于python,我相信它看起來像這樣
value_text = driver.find_element_by_xpath("XPath").text
uj5u.com熱心網友回復:
您可以嘗試以下兩個 xpath
//label[@rel='toltip']/*
uj5u.com熱心網友回復:
你可以使用正則運算式
import re
a = """<div class="xsmall-12 medium-6 columns">
<label class="device-attribute-label" rel="tooltip" title="Proprietary/trade name of the medical device as used in the labeling or catalog.">Brand Name:</label>
Snowden-Pencer
<br>
<label class="device-attribute-label" rel="tooltip" title="Identifies a category or design of devices that have specifications, performance, size, and composition within limits set by the company.">Version or Model:</label>
32-0044
<br>
<label class="device-attribute-label" href="#" rel="tooltip" title="Whether the device is currently offered for sale by the device company. A device no longer in commercial distribution may or may not still be available for purchase in the marketplace.">Commercial Distribution Status:</label>
In Commercial Distribution
<br>
<label class="device-attribute">Catalog Number:</label>
32-0044
<br>
<label class="device-attribute">Company Name:</label>
CAREFUSION 2200, INC
</div>"""
b = re.findall(r' (.*?)\n', a)
c = [x.lstrip().rstrip() for x in b if '<' not in x and '>' not in x]
uj5u.com熱心網友回復:
你可以試試下面的,
wait = WebDriverWait(driver, 20)
entire_span = wait.until(EC.visibility_of_element_located((By.XPATH, "//label[text()='Version or Model:']/..")))
entire_span_splitted = entire_span.get_attribute('innerText').split(":")
#print(entire_span_splitted[0])
print(entire_span_splitted[1])
輸出:
Snowden-Pencer
進口
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/492303.html
標籤:javascript Python html 硒 网页抓取
上一篇:Webscraper中的ValueError:無法將字串轉換為浮點數:'\n€203,88€'
下一篇:如何將資料抓取到excel檔案中
