使用BeautifulSoup從位于具有特定類的Div中的<a>標記中獲取href-有解無憂

我需要從<a>網站中的標簽中獲取hrefs，但不是全部，而只是那些位于<div>s 中的spans with classesarm

<html>
  <body>
    <div class="arm">
      <span>
        <a href="1">link</a>
        <a href="2">link</a>
        <a href="3">link</a>
      </span>
    </div>
    <div class="arm">
      <span>
        <a href="4">link</a>
        <a href="5">link</a>
        <a href="6">link</a>
      </span>
    </div>
    <div class="arm">
      <span>
        <a href="7">link</a>
        <a href="8">link</a>
        <a href="9">link</a>
      </span>
    </div>
    <div class="footnote">
      <span>
        <a href="1">anotherLink</a>
        <a href="2">anotherLink</a>
        <a href="3">anotherLink</a>
      </span>
    </div>
  </body>
</html>

import requests
from bs4 import BeautifulSoup as bs

request = requests.get("url")
html = bs(request.content, 'html.parser')

for arm in html.select(".arm"):
    anchor = arm.select("span > a")
    print("anchor['href']")

但我的代碼不列印任何東西

uj5u.com熱心網友回復：

在您到達print("anchor['href']")我認為應該是的行之前，您的代碼看起來很好print(anchor['href'])。

現在，anchor 是一個 ResultSet，這意味著您將需要另一個回圈來獲取 href。如果您希望對代碼進行最少的修改，那么這些最后幾行應該是這樣的：

for arm in soup.select(".arm"):
    anchor = arm.select("span > a")
    for x in anchor:
        print(x.attrs['href'])

我們基本上添加：

    for x in anchor:
        print(x.attrs['href'])

你應該得到hrefs。一切順利。

這是我的輸出：使用 BeautifulSoup 從位于具有特定類的 Div 中的 <a> 標記中獲取 href

uj5u.com熱心網友回復：

嘗試使用該find.all()方法獲取特定tags和class

我已經復制了您的 HTML 檔案并獲取了span標簽中的值。請參閱下面的示例代碼。

復制的 HTML 檔案：

# Creating the HTML file
file_html = open("demo.html", "w")
# Adding the input data to the HTML file
file_html.write('''<html>
  <body>
    <div >
      <span>
        <a href="1">link</a>
        <a href="2">link</a>
        <a href="3">link</a>
      </span>
    </div>
    <div >
      <span>
        <a href="4">link</a>
        <a href="5">link</a>
        <a href="6">link</a>
      </span>
    </div>
    <div >
      <span>
        <a href="7">link</a>
        <a href="8">link</a>
        <a href="9">link</a>
      </span>
    </div>
    <div >
      <span>
        <a href="1">anotherLink</a>
        <a href="2">anotherLink</a>
        <a href="3">anotherLink</a>
      </span>
    </div>
  </body>
</html>''')
# Saving the data into the HTML file
file_html.close()

代碼：

import requests
from bs4 import BeautifulSoup as bs

#reading the replicated html file
demo = open("demo.html", "r")
results = bs(demo, 'html.parser')

#Using find.all method to find specific tags and class
job_elements = results.find_all("div", class_="arm")

for job_element in job_elements:
    links = job_element.find_all("a")
    for link in links:
        print(link['href'])

輸出：

使用 BeautifulSoup 從位于具有特定類的 Div 中的 <a> 標記中獲取 href

參考：

https://realpython.com/beautiful-soup-web-scraper-python/

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/525817.html

標籤：Python解析美丽的汤

上一篇：在F#中從文本檔案中讀取輸入--用換行符讀取輸入的問題

下一篇：具有單子決議庫的決議器的高級組織