前言
本文的文字及圖片來源于網路,僅供學習、交流使用,不具有任何商業用途,如有問題請及時聯系我們以作處理,
很多人學習python,不知道從何學起,
很多人學習python,掌握了基本語法過后,不知道在哪里尋找案例上手,
很多已經做案例的人,卻不知道如何去學習更加高深的知識,
那么針對這三類人,我給大家提供一個好的學習平臺,免費領取視頻教程,電子書籍,以及課程的源代碼!??¤
QQ群:961562169
開發工具
- Python 3.6.5
- Pycharm
- requests
- re
- json
相關模塊可用pip命令安裝
網頁分析
https://search.51job.com/list/010000%252c020000%252c030200%252c040000,000000,0000,00,9,99,python,2,1.html
- 1

?
請求網頁
import requests
url = 'https://search.51job.com/list/010000%252c020000%252c030200%252c040000,000000,0000,00,9,99,python,2,1.html'
params = {
'lang': 'c',
'postchannel': '0000',
'workyear': '99',
'cotype': '99',
'degreefrom': '99',
'jobterm': '99',
'companysize': '99',
'ord_field': '0',
'dibiaoid': '0',
'line': '',
'welfare': '',
}
cookies = {
'''
你的cookie
'''
}
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Host': 'search.51job.com',
'Referer': 'https://search.51job.com/list/190200,000000,0000,00,9,99,python,2,1.html?lang=c&postchannel=0000&workyear=99&cotype=99°reefrom=99&jobterm=99&companysize=99&ord_field=0&dibiaoid=0&line=&welfare=',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36',
}
response = requests.get(url=url, params=params, headers=headers, cookies=cookies)
response.encoding = response.apparent_encoding
print(response.text)

?
咱們需要的資料的在<script> 里面
/nwindow.__SEARCH_RESULT__ = /n'''\n你想要獲取的內容\n'''\n<div class=\"clear\"></div>\n","classes":[]}" data-cke-widget-upcasted="1" data-cke-widget-keep-attr="0" data-widget="codeSnippet"><script type="text/javascript">
window.__SEARCH_RESULT__ =
'''
你想要獲取的內容
'''
<div ></div>
用正則運算式匹配出來就可以了
把匹配出來的資料轉化程json資料,然后根據字典的取值方式取自己想要資料即可
r = re.findall('window.__SEARCH_RESULT__ = (.*?)</script>', response.text, re.S)
string = ''.join(r)
info_dict = json.loads(string)
pprint.pprint(info_dict)

?

?
完整代碼
import requests
import re
import json
for page in range(1, 11):
url = 'https://search.51job.com/list/010000%252c020000%252c030200%252c040000,000000,0000,00,9,99,python,2,{}.html'.format(page)
params = {
'lang': 'c',
'postchannel': '0000',
'workyear': '99',
'cotype': '99',
'degreefrom': '99',
'jobterm': '99',
'companysize': '99',
'ord_field': '0',
'dibiaoid': '0',
'line': '',
'welfare': '',
}
cookies = {
}
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Host': 'search.51job.com',
'Referer': 'https://search.51job.com/list/190200,000000,0000,00,9,99,python,2,1.html?lang=c&postchannel=0000&workyear=99&cotype=99°reefrom=99&jobterm=99&companysize=99&ord_field=0&dibiaoid=0&line=&welfare=',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36',
}
response = requests.get(url=url, params=params, headers=headers, cookies=cookies)
response.encoding = response.apparent_encoding
r = re.findall('window.__SEARCH_RESULT__ = (.*?)</script>', response.text, re.S)
string = ''.join(r)
info_dict = json.loads(string)
dit_py = info_dict['engine_search_result']
dit = {}
for i in dit_py:
attribute_text = ' '.join(i['attribute_text'][1:])
print(attribute_text)
# dit['job_href'] = i['job_href']
dit['job_name'] = i['job_name']
dit['company_name'] = i['company_name']
dit['money'] = i['providesalary_text']
dit['workarea'] = i['workarea_text']
dit['updatedate'] = i['updatedate']
dit['companytype'] = i['companytype_text']
dit['jobwelf'] = i['jobwelf']
dit['attribute'] = attribute_text
dit['companysize'] = i['companysize_text']
print(dit)
with open('python招聘資訊.csv', mode='a', encoding='utf-8') as f:
f.write('{},{},{},{},{},{},{},{}\n'.format(dit['job_name'], dit['company_name'], dit['money'], dit['workarea'], dit['companytype'], dit['jobwelf'], dit['attribute'], dit['companysize']))
實作效果
?
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/173127.html
標籤:Python
