通過seleniump標簽抓取資料-有解無憂

我在互聯網上搜索了很多。我找不到與下面類似的示例。我正在嘗試從網頁中提取文本。第一個 p 標簽中沒有定位線。第二定位部分具有定位線。拉取資料的時候只能拉取p標簽的內容，也就是位置行。我無法提取另一個 p 標簽的內容。我想知道如何在第一個和第二個 p 標簽中提取資料？

Page Source的HTML代碼：

<div >
    <p>                                                                       
    <i class='fa fa-home main-color'></i> ORHAN MAH.?BRAH?M CAD. NO:35  
    <br>
    <i class='fa fa-phone main-color'></i> 
    <a  href="tel:0508-2920344">0508-2920344 </a>
    <br /> 
    <i class='fa fa-clock-o main-color'></i> 
    <span >19.01.2022</span>     
    </p>
    <p>
       <i class='fa fa-home main-color'></i> HAZAN MAH.?KTEM CAD. NO:13/B                                           
    <br>
    <i class='fa fa-phone main-color'></i> 
    <a  href="tel:0584 837 23 70">0584 837 23 70 </a>
    <br>
    <i ></i> 
    <a  href="https://www.google.com/maps?q=35.554433,25.887766" target="_blank">Haritada</a>
    <br /> 
    <i class='fa fa-clock-o main-color'></i> 
    <span >20.01.2022</span> 
    </p>
</div>

這是我用來從上面的 HTML 源中提取資料的 selenium 代碼：

item = browser.find_elements_by_class_name("col-md-10")
urls = browser.find_elements_by_xpath("//div[@class=' col-md-10']/p/a[2]")
for i in zip(item,urls):
    try:            
        address = i[0].find_element_by_css_selector("p").text.split("\n")[:2]
    except:
        address = None
    try:            
        phone = i[0].find_element_by_xpath("//a[@class='gri'][1]").text
    except:
        phone = None
    print(address)
    print(phone)
    try:
        url = i[1].get_attribute('href').replace("https://www.google.com/maps?q=","")
    except:
        url = None
    try:            
        date = i[0].find_element_by_xpath("//span[@class='red'][1]").text
    except:
        date = None
    print(url)
    print(date)

uj5u.com熱心網友回復：

Use xpath //div[@class=' col-md-8']/p. This will return data of both p tags. Then you can perform string operations as per your requirement and use data of each p tag using for loop

uj5u.com熱心網友回復：

After long research, I found the solution to the problem, friends. It is necessary to use zip_longest from the itertools module.

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/422550.html

標籤：

上一篇：第三方腳本/html小部件能否復制您的站點資料，包括cookie、html和其他專案

下一篇：在Tomcat中為Log4j設定系統屬性