pythonbeautifulsoup4如何在div標簽中獲取跨度文本-有解無憂

這是html代碼

<div aria-label="RM 6,000 a month" class="salary-snippet"><span>RM 6,000 a month</span></div>

我是這樣用的

divs = soup.find_all('div', class_='job_seen_beacon')
    for item in divs:
        print(item.find('div', class_='salary-snippet'))

我得到了一個串列，例如

<div aria-label="RM 3,500 to RM 8,000 a month" class="salary-snippet"><span>RM 3,500 - RM 8,000 a month</span></div>

如果我用過

print(item.find('div', class_='salary-snippet').text.strip())

它會回傳錯誤

AttributeError: 'NoneType' object has no attribute 'text'

那么我怎樣才能只得到跨度文本呢？這是我第一次爬網

uj5u.com熱心網友回復：

可能這就是你要找的。

首先選擇所有<div>帶有 class的標簽，salary-snippet因為這是<span>您要查找的標簽的父級。用.find_all()
現在迭代所有<div>從上面選擇的標簽并找到<span>每個<div>.
根據您的問題，我認為所有這些<div>可能都沒有<span>標簽。在這種情況下，您只能列印<div>包含span標簽的文本。見下文

# Find all the divs
d = soup.find_all('div', class_='salary-snippet')
# Iterating over the <div> tags
for item in d:
    # Find <span> in each item. If not exists x will be None
    x = item.find('span')
    # Check if x is not None and then only print
    if x:
        print(x.text.strip())

這是完整的代碼。

from bs4 import BeautifulSoup

s = """<div aria-label="RM 6,000 a month" ><span>RM 6,000 a month</span></div>"""
soup = BeautifulSoup(s, 'lxml')

d = soup.find_all('div', class_='salary-snippet')
for item in d:
    x = item.find('span')
    if x:
        print(x.text.strip())

RM 6,000 a month

uj5u.com熱心網友回復：

我相信這條線應該是：

print(item.find('div', {'class':'salary-snippet'}).text.strip())

或者，如果只有span你可以簡單地使用：

item.find("span").text.strip()

考慮到您使用的.find_all()方法，您可能希望確保每個div從您的 HTML 回傳

soup.find_all('div', class_='job_seen_beacon')

包含您正在尋找的元素，因為如果只有一個元素不存在，則可能會出現這種情況。

divs = soup.find_all('div', class_='job_seen_beacon')
for item in divs:
    try:
        print(item.find('div', {'class':'salary-snippet'}).text.strip())
    except AttributeError:
        print("Item Not available")

這將做的是嘗試獲取文本，但如果失敗將列印失敗的專案，以便您可以確定原因......也許它沒有您正在搜索的元素。

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/341157.html

標籤：Python 蟒蛇-3.x 网页抓取美汤

上一篇：如何獲取h3標簽下的特定鏈接？

下一篇：使用R中的rvest庫從網頁中抓取資訊