將所有鏈接的X路徑轉換為通用路徑 -有解無憂

我正試圖從這個鏈接中提取資料。我已經通過檢查我想要的元素的路徑寫了下面的代碼。但這只給了1篇新聞文章的特定路徑。我怎樣才能得到一個通用的路徑，以便我可以提取多篇新聞文章，而不必每次都改變路徑？

import requests
from datetime import datetime
from bs4 import BeautifulSoup

驅動程式.maximize_window()
driver.implicitly_wait(10)
driver.get("https://finance.yahoo.com/quote/AAPL/news?p=AAPL")
鏈接=[]
日期=[]
for i in range（20）。
       driver.execute_script("window.scrollBy(0, 250)")
       time.sleep(1)

all_items = driver.find_elements_by_xpath('//*[@id="latestQuoteNewsStream-0-Stream"]/ul/li[1]'/span>)
for item in all_items:
    links.append(item.find_element_by_xpath('./div/div/div[2]/h3/a[@href]'/span>).text)
    date.append(item.find_element_by_xpath('./div/div[2]/div[2]/span[2]').text)
    time.sleep(2)

uj5u.com熱心網友回復：

我如何獲得一個通用路徑，以便我可以提取多篇新聞文章而不需要每次都改變路徑？

這些新聞文章的通用路徑應該是這樣的：

all_items = driver.find_elements_by_xpath("/a[ contains(@class, 'js-content-viewer')]")

在這里，我們正在抓取所有包含'js-content-viewer'類的'a'標簽元素。

當瀏覽網站時，這似乎是一個通用的目標類，在所有的新聞文章中都是共享的。這是否有助于回答你的問題？我相信你要求的是一個選擇器，該選擇器將在AAPL頁面以外的其他頁面上作業，以搜刮新聞文章資訊。

uj5u.com熱心網友回復：

你可以嘗試獲取所有的文章，然后對它們進行迭代。

all_items = driver.find_elements_by_xpath('//*[@id="latestQuoteNewsStream-0-Stream"]/ul/li')
for item in all_items:
    try:
        links.append(item.find_element_by_xpath('.//h3/a'/span>).get_attribute('href'/span>)
        date.append(item.find_element_by_xpath('.//span[contains(text(), "hours") or contains(text(), "agree")]').text)
    except:
        pass。

這只是一個粗略的答案。這里我使用了try-except塊，因為有Ads會導致錯誤。你可以定義你自己的條件來檢查一個li標簽是否是廣告。

。

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/321096.html

標籤：

上一篇：CSDN博客開啟

下一篇：如何分割行并在行后列印輸出？(Python)