我有下面的內容,我試圖了解如何<p>使用 Beautiful Soup提取標簽副本(我對其他方法持開放態度)。正如你所看到的,<p>標簽不是都嵌套在同一個<div>. 我用以下方法試了一下,但這似乎只有在兩個<p>標簽都在同一個容器中時才有效。
<div class="top-panel">
<div class="inside-panel-0">
<h1 class="h1-title">Some Title</h1>
</div>
<div class="inside-panel-0">
<div class="inside-panel-1">
<p> I want to extract this copy</p>
</div>
<div class="inside-panel-1">
<p>I want to extract this copy</p>
</div>
</div>
</div>
uj5u.com熱心網友回復:
IIUC嘗試
from bs4 import BeautifulSoup
html = """<div >
<div >
<h1 >Some Title</h1>
</div>
<div >
<div >
<p> I want to extract this copy</p>
</div>
<div >
<p>I want to extract this copy</p>
</div>
</div>
</div>"""
soup = BeautifulSoup(html, 'lxml')
# find all the p tags that have a parent class of inside-panel-1
soup.findAll({'p': {'class': 'inside-panel-1'}})
[<p> I want to extract this copy</p>, <p>I want to extract this copy</p>]
如果您只想要文本,請嘗試
p_tags = soup.findAll({'p': {'class': 'inside-panel-1'}})
[elm.text for elm in p_tags]
# -> [' I want to extract this copy', 'I want to extract this copy']
uj5u.com熱心網友回復:
由于 p 標簽在里面div vertical-align: inherit;">,所以我們可以通過呼叫 find_all 方法輕松獲取它們,如下所示:
from bs4 import BeautifulSoup
html = """
<div >
<div >
<h1 >
Some Title
</h1>
</div>
<div >
<div >
<p>
I want to extract this copy
</p>
</div>
<div >
<p>
I want to extract this copy
</p>
</div>
</div>
</div>
"""
soup = BeautifulSoup(html, 'html.parser')
# print(soup.prettify())
p_tags = soup.select('div.top-panel div[]')
for p_tag in p_tags:
print(p_tag.get_text(strip=True))
輸出:
I want to extract this copy
I want to extract this copy
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/361655.html
