看起來很簡單,但我還沒有設法找到解決方案。我嘗試了其他提議的解決方案,例如:span.clear()但沒有這樣做。
網頁結構:
<div class="details">
<h2>Public function</h2>
<div class="token">
<h2>Name person</h2>
<h3>Name person</h3>
<p>
<span>NO</span>NO</p>
<p>
<span>Time of Death:</span>13:38:00</p>
結果:
Time of Death: 13:38:00
想要的結果:
13:38:00
我的代碼:
whole_section = soup.find('div', {'class':"token"}) # Access to whole section
name_person = whole_section.h2.text # Select person's name, inside "h2" tag.
time_decease = whole_section.h3.next_sibling.next_sibling.next_sibling.next_sibling.text # Because ther's no tag, I'd to use "next_sibling".
uj5u.com熱心網友回復:
我真的不建議通過反復嘗試獲取下一個兄弟來遍歷 DOM - 根據我的經驗,每次這樣做都會使您的腳本越來越容易因源 HTML 中的最小更改而損壞。
相反,<p></p>通過使用lambda函式根據<p></p>自身的內容(特別是'Time of Death:'字串)進行過濾來找到您所追求的父物件;然后遍歷該元素的子元素<p></p>并洗掉<span></span>以提取您想要的內容:
html = '''<div >
<h2>Public function</h2>
<div >
<h2>Name person</h2>
<h3>Name person</h3>
<p>
<span>NO</span>NO</p>
<p>
<span>Time of Death:</span>13:38:00</p>
</div>
</div>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
whole_section = soup.find('div', {'class':"token"}) # Access to whole section
name_person = whole_section.h2.text # Select person's name, inside "h2" tag.
time_decease = whole_section.find(lambda element: element.name == 'p' and 'Time of Death:' in element.text)
for span in time_decease.find_all('span'):
span.decompose()
print(name_person)
print(time_decease.text)
復制
uj5u.com熱心網友回復:
你可以試試這個:
from bs4 import BeautifulSoup
soup = BeautifulSoup(
"""
<div >
<h2>Public function</h2>
<div >
<h2>Name person</h2>
<h3>Name person</h3>
<p>
<span>NO</span>NO
</p>
<span title="Time of Death:">13:38:00</span>
</div>
""", "xml")
print(soup.select_one("span[title*=Time]").text)
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/341155.html
上一篇:如何遍歷imdb影評?
下一篇:如何獲取h3標簽下的特定鏈接?
