我想在不知道相應屬性名稱的情況下查找屬性值等于“ATTR1”和“ATTR2”的所有標簽。
假設我有以下內容:
page_content = '''<a href="ATTR1">text1</a>
<div type="ATTR2">text2</div>
<script id="ATTR2">text3</script>
<span id="ATTR2">text5</span>'''
我想要一個只檢索第三個元素的腳本,它的屬性等于“ATTR1”和一個屬性等于“ATTR2”。也就是說,我需要以下內容:
<script class="ATTR1" id="ATTR2">text3</script>
我知道我可以將函式作為引數傳遞給find_all(). 但是,我需要幫助來理解如何撰寫一個在滿足這些條件時回傳 true 的函式。
uj5u.com熱心網友回復:
知道屬性名稱,只需將您的條件鏈接起來,例如css selector:
select('#ATTR2.ATTR1')
或者在不知道屬性的情況下,只檢查所有值:
for e in soup():
attr_list = [v for i in list(e.attrs.values()) for v in (i if isinstance(i,list) else [i])]
if all(x in attr_list for x in ['ATTR1','ATTR2']):
print(e)
例子
from bs4 import BeautifulSoup
html = '''
<a href="ATTR1">text1</a>
<div type="ATTR2">text2</div>
<script id="ATTR2">text3</script>
<span id="ATTR2">text5</div>'''
soup = BeautifulSoup(html)
print(soup.select('#ATTR2.ATTR1'))
for e in soup():
attr_list = [v for i in list(e.attrs.values()) for v in (i if isinstance(i,list) else [i])]
if all(x in attr_list for x in ['ATTR1','ATTR2']):
print(e)
輸出
[<script class="ATTR1" id="ATTR2">text3</script>]
[<script class="ATTR1" id="ATTR2">text3</script>]
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/525883.html
