如何使用BeautifulSoup在類中查找類的資料-有解無憂

嗨，我正在嘗試在網站https://pythonjobs.github.io/上測驗我對 BeautifulSoup 的了解。

我希望能夠列印出每個串列以及他們的作業角色、位置、公司等。

import requests
import json
from bs4 import BeautifulSoup

URL = "https://pythonjobs.github.io/"
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

job = soup.find(id='container')
job_elements = job.find_all(class_='job')


for job_element in job_elements:
    location = job_element.find(class_='i-globe')
    date = job_element.find('i', class_='calendar')
    length = job_element.find('i', class_='i-chair')
    company_name = job_element.find('i', class_='i-company')
    description = job_element.find('p', class_='detail')

這是我擁有的代碼，但它都回傳無。

此處為 1 個職位串列的網站上的 HTML 片段供參考。我發現 job_element.find('info') 回傳所有資訊，但無法隔離公司、位置等每一位。我該怎么做？謝謝

<section class="job_list">
            <div class="job" data-order="0" data-slug="datadog-open-source-software-engineer-python" data-tags="python,django,flask,falcon,celery">
            <a class="go_button"
                href="/jobs/datadog-open-source-software-engineer-python.html">
                Read more <i class="i-right"></i>
            </a>
                    <h1><a href="/jobs/datadog-open-source-software-engineer-python.html">Open Source Software Engineer - Python</a></h1>

    <span class="info"><i class="i-globe"></i> New York City or Remote</span>
    <span class="info"><i class="i-calendar"></i> Thu, 03 Jun 2021</span>
    <span class="info"><i class="i-chair"></i> permanent</span>
    <span class="info"><i class="i-company"></i> Datadog</span>

uj5u.com熱心網友回復：

因為文本不在<i>您應該使用.next或.next_sibling獲取它們的范圍內，還要檢查您的選擇是否有class class_='i-calendar'而不是class_='calendar'：

jobs=[]
for job_element in job_elements:
    jobs.append({
        'location': job_element.find(class_='i-globe').next,
        'date': job_element.find('i', class_='i-calendar').next,
        'length': job_element.find('i', class_='i-chair').next,
        'company_name': job_element.find('i', class_='i-company').next,
        'description': job_element.find('p', class_='detail').text 
    })
    
jobs

輸出

[{'location': ' New York City or Remote',
  'date': ' Thu, 03 Jun 2021',
  'length': ' permanent',
  'company_name': ' Datadog',
  'description': ' The\xa0Role In this role on our APM (tracing/profiling/debugging) team you will: Write open source code that instruments thousands of Python applications around the world. Drive our open source Python projects and...'},
 {'location': ' remote',
  'date': ' Sun, 11 Apr 2021',
  'length': ' permanent, part-time possible',
  'company_name': ' RealRate GmbH',
  'description': ' RealRate is Hiring Senior Python\xa0Developers! RealRate, the Artificial Intelligence rating agency is growing. We’re looking for a senior Python\xa0developer: More than 8 years of project\xa0experience. Python\xa0senior. Data...'},...]

uj5u.com熱心網友回復：

import requests
import json
from bs4 import BeautifulSoup

URL = "https://pythonjobs.github.io/"
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

job = soup.find(id='container')
job_elements = job.find_all(class_='job')


for job_element in job_elements:
    location = job_element.find('i', class_='i-globe')
    date = job_element.find('i', class_='i-calendar')
    length = job_element.find('i', class_='i-chair')
    company_name = job_element.find('i', class_='i-company')
    description = job_element.find('p', class_='detail') # here you would need to do description.text
    print(location.next.text) # to get the text for of the span for the <i> elements

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/512210.html

標籤：Python班级解析美丽的汤

上一篇：查找檔案中不同鍵的出現在C 中有多個列

下一篇：使用制表符分隔符決議文本檔案