洗掉跨度內的內容-有解無憂

看起來很簡單，但我還沒有設法找到解決方案。我嘗試了其他提議的解決方案，例如：span.clear()但沒有這樣做。

網頁結構：

<div class="details">           
  <h2>Public function</h2>  
  <div class="token">
    <h2>Name person</h2>
    <h3>Name person</h3>
    <p>
        <span>NO</span>NO</p>
    <p>
        <span>Time of Death:</span>13:38:00</p>

結果：

Time of Death: 13:38:00

想要的結果：

13:38:00

我的代碼：

whole_section = soup.find('div', {'class':"token"}) # Access to whole section
name_person = whole_section.h2.text  # Select person's name, inside "h2" tag.
time_decease = whole_section.h3.next_sibling.next_sibling.next_sibling.next_sibling.text # Because ther's no tag, I'd to use "next_sibling".

uj5u.com熱心網友回復：

我真的不建議通過反復嘗試獲取下一個兄弟來遍歷 DOM - 根據我的經驗，每次這樣做都會使您的腳本越來越容易因源 HTML 中的最小更改而損壞。

相反，通過使用lambda函式根據自身的內容（特別是'Time of Death:'字串）進行過濾來找到您所追求的父物件；然后遍歷該元素的子元素并洗掉以提取您想要的內容：

html = '''<div >           
  <h2>Public function</h2>  
  <div >
    <h2>Name person</h2>
    <h3>Name person</h3>
    <p>
        <span>NO</span>NO</p>
    <p>
        <span>Time of Death:</span>13:38:00</p>
  </div>
</div>'''

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')

whole_section = soup.find('div', {'class':"token"}) # Access to whole section
name_person = whole_section.h2.text  # Select person's name, inside "h2" tag.
time_decease = whole_section.find(lambda element: element.name == 'p' and 'Time of Death:' in element.text)
for span in time_decease.find_all('span'):
  span.decompose()

print(name_person)
print(time_decease.text)

^復制

uj5u.com熱心網友回復：

你可以試試這個：

from bs4 import BeautifulSoup

soup = BeautifulSoup(
"""
<div >           
  <h2>Public function</h2>  
  <div >
    <h2>Name person</h2>
    <h3>Name person</h3>
    <p>
      <span>NO</span>NO
    </p>
      <span title="Time of Death:">13:38:00</span> 
</div>

""", "xml")


print(soup.select_one("span[title*=Time]").text)

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/341155.html

標籤：Python 网页抓取

上一篇：如何遍歷imdb影評？

下一篇：如何獲取h3標簽下的特定鏈接？