我在互聯網上搜索了很多。我找不到與下面類似的示例。我正在嘗試從網頁中提取文本。第一個 p 標簽中沒有定位線。第二定位部分具有定位線。拉取資料的時候只能拉取p標簽的內容,也就是位置行。我無法提取另一個 p 標簽的內容。我想知道如何在第一個和第二個 p 標簽中提取資料?
Page Source的HTML代碼:
<div >
<p>
<i class='fa fa-home main-color'></i> ORHAN MAH.?BRAH?M CAD. NO:35
<br>
<i class='fa fa-phone main-color'></i>
<a href="tel:0508-2920344">0508-2920344 </a>
<br />
<i class='fa fa-clock-o main-color'></i>
<span >19.01.2022</span>
</p>
<p>
<i class='fa fa-home main-color'></i> HAZAN MAH.?KTEM CAD. NO:13/B
<br>
<i class='fa fa-phone main-color'></i>
<a href="tel:0584 837 23 70">0584 837 23 70 </a>
<br>
<i ></i>
<a href="https://www.google.com/maps?q=35.554433,25.887766" target="_blank">Haritada</a>
<br />
<i class='fa fa-clock-o main-color'></i>
<span >20.01.2022</span>
</p>
</div>
這是我用來從上面的 HTML 源中提取資料的 selenium 代碼:
item = browser.find_elements_by_class_name("col-md-10")
urls = browser.find_elements_by_xpath("//div[@class=' col-md-10']/p/a[2]")
for i in zip(item,urls):
try:
address = i[0].find_element_by_css_selector("p").text.split("\n")[:2]
except:
address = None
try:
phone = i[0].find_element_by_xpath("//a[@class='gri'][1]").text
except:
phone = None
print(address)
print(phone)
try:
url = i[1].get_attribute('href').replace("https://www.google.com/maps?q=","")
except:
url = None
try:
date = i[0].find_element_by_xpath("//span[@class='red'][1]").text
except:
date = None
print(url)
print(date)
uj5u.com熱心網友回復:
Use xpath //div[@class=' col-md-8']/p. This will return data of both p tags.
Then you can perform string operations as per your requirement and use data of each p tag using for loop
uj5u.com熱心網友回復:
After long research, I found the solution to the problem, friends. It is necessary to use zip_longest from the itertools module.
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/422550.html
標籤:
