我需要從<a>網站中的標簽中獲取hrefs,但不是全部,而只是那些位于<div>s 中的spans with classesarm
<html>
<body>
<div class="arm">
<span>
<a href="1">link</a>
<a href="2">link</a>
<a href="3">link</a>
</span>
</div>
<div class="arm">
<span>
<a href="4">link</a>
<a href="5">link</a>
<a href="6">link</a>
</span>
</div>
<div class="arm">
<span>
<a href="7">link</a>
<a href="8">link</a>
<a href="9">link</a>
</span>
</div>
<div class="footnote">
<span>
<a href="1">anotherLink</a>
<a href="2">anotherLink</a>
<a href="3">anotherLink</a>
</span>
</div>
</body>
</html>
import requests
from bs4 import BeautifulSoup as bs
request = requests.get("url")
html = bs(request.content, 'html.parser')
for arm in html.select(".arm"):
anchor = arm.select("span > a")
print("anchor['href']")
但我的代碼不列印任何東西
uj5u.com熱心網友回復:
在您到達print("anchor['href']")我認為應該是的行之前,您的代碼看起來很好print(anchor['href'])。
現在,anchor 是一個 ResultSet,這意味著您將需要另一個回圈來獲取 href。如果您希望對代碼進行最少的修改,那么這些最后幾行應該是這樣的:
for arm in soup.select(".arm"):
anchor = arm.select("span > a")
for x in anchor:
print(x.attrs['href'])
我們基本上添加:
for x in anchor:
print(x.attrs['href'])
你應該得到hrefs。一切順利。
這是我的輸出:

uj5u.com熱心網友回復:
嘗試使用該find.all()方法獲取特定tags和class
我已經復制了您的 HTML 檔案并獲取了span標簽中的值。請參閱下面的示例代碼。
復制的 HTML 檔案:
# Creating the HTML file
file_html = open("demo.html", "w")
# Adding the input data to the HTML file
file_html.write('''<html>
<body>
<div >
<span>
<a href="1">link</a>
<a href="2">link</a>
<a href="3">link</a>
</span>
</div>
<div >
<span>
<a href="4">link</a>
<a href="5">link</a>
<a href="6">link</a>
</span>
</div>
<div >
<span>
<a href="7">link</a>
<a href="8">link</a>
<a href="9">link</a>
</span>
</div>
<div >
<span>
<a href="1">anotherLink</a>
<a href="2">anotherLink</a>
<a href="3">anotherLink</a>
</span>
</div>
</body>
</html>''')
# Saving the data into the HTML file
file_html.close()
代碼:
import requests
from bs4 import BeautifulSoup as bs
#reading the replicated html file
demo = open("demo.html", "r")
results = bs(demo, 'html.parser')
#Using find.all method to find specific tags and class
job_elements = results.find_all("div", class_="arm")
for job_element in job_elements:
links = job_element.find_all("a")
for link in links:
print(link['href'])
輸出:

參考:
https://realpython.com/beautiful-soup-web-scraper-python/
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/525817.html
標籤:Python解析美丽的汤
上一篇:在F#中從文本檔案中讀取輸入--用換行符讀取輸入的問題
下一篇:具有單子決議庫的決議器的高級組織
