我有以下.html:
<li class="print text">
<span><em class="time">
<div class="time">1.29 s</div>
</em><em class="status">passed</em>This is the text I want to get</span>
我只需要獲取所有其他標簽之外的文本(文本是:這是我想要獲取的文本)。
我試圖使用這段代碼:
for el in doc.find_all('li', attrs={'class': 'print text'}):
print(el.get_text())
但不幸的是,它會列印所有內容,包括 em 標簽等。
有沒有辦法做到這一點?
謝謝!!
uj5u.com熱心網友回復:
查找特定li標簽class并在標簽上使用find_all方法以使用索引和方法回傳文本em從串列中獲取最后一個標簽next-sibling
from bs4 import BeautifulSoup
soup="""<li >
<span><em >
<div >1.29 s</div>
</em><em >passed</em>This is the text I want to get</span>"""
soup=BeautifulSoup(soup)
soup.find("li",class_="print text").find_all("em")[-1].next_sibling
uj5u.com熱心網友回復:
你可以去find(text=True, recursive=False)實作你的目標。
例子
from bs4 import BeautifulSoup
soup='''<li >
<span><em >
<div >1.29 s</div>
</em><em >passed</em>This is the text I want to get</span>'''
soup=BeautifulSoup(soup)
soup.find('li',class_='print text').span.find(text=True, recursive=False)
輸出
This is the text I want to get
如果你有多個span你li可以去:
from bs4 import BeautifulSoup
soup='''<li >
<span><em class="time">
<div class="time">1.29 s</div>
</em><em class="status">passed</em>This is the text I want to get</span>
<span><em class="time">
<div class="time">1.50 s</div>
</em><em class="status">passed</em>This is the text I want to get too</span>'''
soup=BeautifulSoup(soup)
for e in soup.select('li.print.text span'):
print(e.find(text=True, recursive=False))
輸出
This is the text I want to get
This is the text I want to get too
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/483248.html
上一篇:bs4:在for回圈中跳過AttributeError
下一篇:如何抓取跨度兄弟跨度的文本?
