存在類似的主題,但我找不到確切的答案,所以請您幫幫我嗎?
我從互聯網上復制了以下代碼,以從確實中抓取作業機會。問題是代碼無法抓取職位描述。
使用時:sum_div = job.find_elements_by_class_name('summary')
代碼沒有識別“摘要”,沒有得到職位描述所在的位置,也無法關閉確實出現的彈窗。
我嘗試了其他識別符號,例如:sum_div = job.find_element_by_class_name('job_seen_beacon')
它結束并關閉彈出視窗,但仍然不能很好地識別職位描述的位置。
請問您知道如何解決這個問題嗎?
for i in range(0,50,10):
driver.get('https://www.indeed.co.in/jobs?q=artificial intelligence&l=India&start=' str(i))
jobs = []
driver.implicitly_wait(20)
for job in driver.find_elements_by_class_name('result'):
#soup = BeautifulSoup(job.get_attribute('innerHTML'),'html.parser')
result_html = job.get_attribute('innerHTML')
soup = BeautifulSoup(result_html, 'html.parser')
try:
title = soup.find(class_="jobTitle").text
except:
title = 'None'
try:
location = soup.find(class_="companyLocation").text
except:
location = 'None'
try:
company = soup.find(class_="companyName").text.replace("\n","").strip()
except:
company = 'None'
sum_div = job.find_elements_by_class_name('summary')
#sum_div = job.find_element_by_class_name('job_seen_beacon')
try:
sum_div.click()
except:
close_button = driver.find_elements_by_class_name('popover-x-button-close')
close_button.click()
sum_div.click()
driver.implicitly_wait(2)
try:
job_desc = driver.find_element_by_css_selector('div#vjs-desc').text
print(job_desc)
except:
job_desc = 'None'
df = df.append({'Title':title,'Location':location,"Company":company,
"Description":job_desc},ignore_index=True)
uj5u.com熱心網友回復:
url不是動態的。所以不需要使用selenium。你可以使用bs4和requests提取所需的資料。下面給出一個例子。
P/S:您不能使用 try 除非每個頁面包含 15 個專案。
from bs4 import BeautifulSoup
import requests
import pandas as pd
jobs = []
for i in range(0,50,10):
url='https://www.indeed.co.in/jobs?q=artificial intelligence&l=India&start=' str(i)
req=requests.get(url)
soup = BeautifulSoup(req.content, 'html.parser')
for job in soup.select('.result'):
try:
title = job.find(class_="jobTitle").text
except:
title = 'None'
try:
location = job.find(class_="companyLocation").text
except:
location = 'None'
try:
company = job.find(class_="companyName").text.replace("\n","").strip()
except:
company = 'None'
try:
job_desc = job.select_one('div.job-snippet ul ').get_text(strip=True)
except:
job_desc = 'None'
jobs.append({'Title':title,'Location':location,"Company":company,"Description":job_desc})
df =pd.DataFrame(jobs)
print(df)
#to store data
#df.to_csv('data.csv',index=False)
輸出:
Title Description
0 newData Scientist: Artificial Intelligence ... As a Data Scientist at IBM, you will help tran...
1 AI and Machine Learning ... A machine learning
engineer (ML engineer) focu...
2 newGraduate Intern - Technical ... DPEA enables that data center which is the und...
3 Artificial Intelligence & Machine Learning Expert ... Define and drive projects in AI and Machine Le...
4 newML Data Associate I ... Good familiarity with the Windows desktop envi...
.. ... ...
...
70 newData Scientist ... Perform data analysis and modelling on data se...
71 AI, Informatics & ML – Research Scientist ... Years of experience 2-4 yrs.Key Responsibiliti...
72 Software Development ... Software Developers at IBM are the backbone of...
73 newB2B/EDI - Map Development Specialist ... Software Developers at IBM are the backbone of...
74 Artificial Intelligence / Data Science/ Machin... ... TATA ELXSI Ltd. is conducting off
campus drive...
[75 rows x 4 columns]
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/462924.html
