我需要迭代無效的 HTML 并從所有標簽中獲取一個文本值來更改它。
from bs4 import BeautifulSoup
html_doc = """
<div class="oxy-toggle toggle-7042 toggle-7042-expanded" data-oxy-toggle-active-class="toggle-7042-expanded" data-oxy-toggle-initial-state="closed" id="_toggle-212-142">
<div class="oxy-expand-collapse-icon" href="#"></div>
<div class="oxy-toggle-content">
<h3 class="ct-headline" id="headline-213-142"><span class="ct-span" id="span-225-142">Sklizeň jahod 2019</span></h3> </div>
</div><div class="ct-text-block" id="text_block-214-142"><span class="ct-span" id="span-230-142"><p>Za?átek sklizně: <strong>Zahájeno</strong><br>
Otev?eno: <strong>6 h – do otrhání</strong>, denně</p>
</span></div>
"""
soup = BeautifulSoup(html_doc, "html.parser")
for tag in soup.find_all():
print(tag.name)
if tag.string:
tag.string.replace_with("1")
print(soup)
結果是
<div class="oxy-toggle toggle-7042 toggle-7042-expanded" data-oxy-toggle-active-class="toggle-7042-expanded" data-oxy-toggle-initial-state="closed" id="_toggle-212-142">
<div class="oxy-expand-collapse-icon" href="#"></div>
<div class="oxy-toggle-content">
<h3 class="ct-headline" id="headline-213-142"><span class="ct-span" id="span-225-142">1</span></h3> </div>
</div><div class="ct-text-block" id="text_block-214-142"><span class="ct-span" id="span-230-142"><p>Za?átek sklizně: <strong>1</strong><br/>
Otev?eno: <strong>1</strong>, denně</p>
</span></div>
我知道如何替換文本,但 bs 找不到段落標簽的文本。所以沒有找到文本“Za?átek sklizně:”和“Otev?eno:”和“,denně”,所以我無法替換它們。
我嘗試使用不同的決議器,例如 lxml 和 html5lib 不會有所作為。我嘗試了 python 的 HTML 庫,但它不支持更改 HTML 只迭代它。
uj5u.com熱心網友回復:
.string在標簽型別物件上回傳一個NavigableString型別物件->您的標簽有一個字串孩子,然后回傳值是該字串,如果它沒有孩子或多個孩子,它將回傳None。
場景對我來說不是很清楚,但這是基于您的評論的最后一種方法:
我需要通用代碼來迭代任何 html 并找到所有文本,以便我可以使用它們。
for tag in soup.find_all(text=True):
tag.replace_with('1')
例子
from bs4 import BeautifulSoup
html_doc = """<div class="oxy-toggle toggle-7042 toggle-7042-expanded" data-oxy-toggle-active-class="toggle-7042-expanded" data-oxy-toggle-initial-state="closed" id="_toggle-212-142">
<div class="oxy-expand-collapse-icon" href="#"></div>
<div class="oxy-toggle-content">
<h3 class="ct-headline" id="headline-213-142"><span class="ct-span" id="span-225-142">Sklizeň jahod 2019</span></h3> </div>
</div><div class="ct-text-block" id="text_block-214-142"><span class="ct-span" id="span-230-142"><p>Za?átek sklizně: <strong>Zahájeno</strong><br>
Otev?eno: <strong>6 h – do otrhání</strong>, denně</p>
</span></div>"""
soup = BeautifulSoup(html_doc, 'html.parser')
for tag in soup.find_all(text=True):
tag.replace_with('1')
輸出
<div class="oxy-toggle toggle-7042 toggle-7042-expanded" data-oxy-toggle-active-class="toggle-7042-expanded" data-oxy-toggle-initial-state="closed" id="_toggle-212-142">1<div class="oxy-expand-collapse-icon" href="#"></div>1<div class="oxy-toggle-content">1<h3 class="ct-headline" id="headline-213-142"><span class="ct-span" id="span-225-142">1</span></h3>1</div>1</div><div class="ct-text-block" id="text_block-214-142"><span class="ct-span" id="span-230-142"><p>1<strong>1</strong><br/>1<strong>1</strong>1</p>1</span></div>
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/448606.html
標籤:html python-3.x 美丽的汤
