嗨,我正在嘗試在網站https://pythonjobs.github.io/上測驗我對 BeautifulSoup 的了解。
我希望能夠列印出每個串列以及他們的作業角色、位置、公司等。
import requests
import json
from bs4 import BeautifulSoup
URL = "https://pythonjobs.github.io/"
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
job = soup.find(id='container')
job_elements = job.find_all(class_='job')
for job_element in job_elements:
location = job_element.find(class_='i-globe')
date = job_element.find('i', class_='calendar')
length = job_element.find('i', class_='i-chair')
company_name = job_element.find('i', class_='i-company')
description = job_element.find('p', class_='detail')
這是我擁有的代碼,但它都回傳無。
此處為 1 個職位串列的網站上的 HTML 片段供參考。我發現 job_element.find('info') 回傳所有資訊,但無法隔離公司、位置等每一位。我該怎么做?謝謝
<section class="job_list">
<div class="job" data-order="0" data-slug="datadog-open-source-software-engineer-python" data-tags="python,django,flask,falcon,celery">
<a class="go_button"
href="/jobs/datadog-open-source-software-engineer-python.html">
Read more <i class="i-right"></i>
</a>
<h1><a href="/jobs/datadog-open-source-software-engineer-python.html">Open Source Software Engineer - Python</a></h1>
<span class="info"><i class="i-globe"></i> New York City or Remote</span>
<span class="info"><i class="i-calendar"></i> Thu, 03 Jun 2021</span>
<span class="info"><i class="i-chair"></i> permanent</span>
<span class="info"><i class="i-company"></i> Datadog</span>
uj5u.com熱心網友回復:
因為文本不在<i>您應該使用.next或.next_sibling獲取它們的范圍內,還要檢查您的選擇是否有class class_='i-calendar'而不是class_='calendar':
jobs=[]
for job_element in job_elements:
jobs.append({
'location': job_element.find(class_='i-globe').next,
'date': job_element.find('i', class_='i-calendar').next,
'length': job_element.find('i', class_='i-chair').next,
'company_name': job_element.find('i', class_='i-company').next,
'description': job_element.find('p', class_='detail').text
})
jobs
輸出
[{'location': ' New York City or Remote',
'date': ' Thu, 03 Jun 2021',
'length': ' permanent',
'company_name': ' Datadog',
'description': ' The\xa0Role In this role on our APM (tracing/profiling/debugging) team you will: Write open source code that instruments thousands of Python applications around the world. Drive our open source Python projects and...'},
{'location': ' remote',
'date': ' Sun, 11 Apr 2021',
'length': ' permanent, part-time possible',
'company_name': ' RealRate GmbH',
'description': ' RealRate is Hiring Senior Python\xa0Developers! RealRate, the Artificial Intelligence rating agency is growing. We’re looking for a senior Python\xa0developer: More than 8 years of project\xa0experience. Python\xa0senior. Data...'},...]
uj5u.com熱心網友回復:
import requests
import json
from bs4 import BeautifulSoup
URL = "https://pythonjobs.github.io/"
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
job = soup.find(id='container')
job_elements = job.find_all(class_='job')
for job_element in job_elements:
location = job_element.find('i', class_='i-globe')
date = job_element.find('i', class_='i-calendar')
length = job_element.find('i', class_='i-chair')
company_name = job_element.find('i', class_='i-company')
description = job_element.find('p', class_='detail') # here you would need to do description.text
print(location.next.text) # to get the text for of the span for the <i> elements
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/512210.html
下一篇:使用制表符分隔符決議文本檔案
