前言
嗨嘍,大家好呀~這里是愛看美女的茜茜吶
代碼提供者:青燈教育-巳月
知識點:
- 動態資料抓包
- requests發送請求
- 結構化+非結構化資料決議
準備作業
下面的盡量跟我保持一致哦~不然有可能會發生報錯 ??
開發環境:
- python 3.8
運行代碼 - pycharm 2021.2
輔助敲代碼 - requests
第三方模塊 pip install 模塊名
如果安裝python第三方模塊:
-
win + R 輸入 cmd 點擊確定, 輸入安裝命令 pip install 模塊名 (pip install requests) 回車
-
在pycharm中點擊Terminal(終端) 輸入安裝命令
如何配置pycharm里面的python解釋器?
-
選擇file(檔案) >>> setting(設定) >>> Project(專案) >>> python interpreter(python解釋器)
-
點擊齒輪, 選擇add
-
添加python安裝路徑
pycharm如何安裝插件?
-
選擇file(檔案) >>> setting(設定) >>> Plugins(插件)
-
點擊 Marketplace 輸入想要安裝的插件名字 比如:翻譯插件 輸入 translation / 漢化插件 輸入 Chinese
-
選擇相應的插件點擊 install(安裝) 即可
-
安裝成功之后 是會彈出 重啟pycharm的選項 點擊確定, 重啟即可生效
軟體、解答、原始碼、教程可以加Q群:832157862免費獲取~
代碼
采集排名資料
import requests
import re
import csv
def replace(str_):
str_ = re.findall('<div ><div >(.*?)</div></div>', str_)[0]
return str_
with open('rank.csv', mode='a', encoding='utf-8', newline='') as f:
csv_writer = csv.writer(f)
csv_writer.writerow(['country', 'rank', 'region', 'score_1', 'score_2', 'score_3', 'score_4', 'score_5', 'score_6', 'stars', 'total_score', 'university', 'year'])
url = 'https://www.qschina.cn/sites/default/files/qs-rankings-data/cn/2057712_indicators.txt'
response = requests.get(url=url)
json_data = https://www.cnblogs.com/Qqun261823976/archive/2022/07/11/response.json()
data = json_data['data']
for i in data:
country = i['location'] # 國家/地區
rank = i['overall_rank'] # 排名
region = i['region'] # 大洲
score_1 = replace(i['ind_76']) # 學術聲譽
score_2 = replace(i['ind_77']) # 雇主聲譽
score_3 = replace(i['ind_36']) # 師生比
score_4 = replace(i['ind_73']) # 教員參考率
score_5 = replace(i['ind_18']) # 國際教室
score_6 = replace(i['ind_14']) # 國際學生
stars = i['stars'] # 星級
total_score = replace(i['overall']) # 總分
university = i['uni'] # 大學
university = re.findall('<div .*?>(.*?)</a></div></div>', university)[0]
year = "2021" # 年份
print(country, rank, region, score_1, score_2, score_3, score_4, score_5, score_6, stars, total_score, university, year)
with open('rank.csv', mode='a', encoding='utf-8', newline='') as f:
csv_writer = csv.writer(f)
csv_writer.writerow([country, rank, region, score_1, score_2, score_3, score_4, score_5, score_6, stars, total_score, university, year])
資料分析代碼
from pyecharts.charts import * from pyecharts import options as opts from pyecharts.commons.utils import JsCode from pyecharts.components import Table import re import pandas as pd df = pd.read_csv('rank.csv') # 香港,澳門與中國大陸地區等在榜單中是分開的記錄的,這邊都歸為china df['loc'] = df['country'] df['country'].replace(['China (Mainland)', 'Hong Kong SAR', 'Taiwan', 'Macau SAR'],'China',inplace=True) tool_js = """ <div style="border-bottom: 1px solid rgba(255,255,255,.3); font-size: 18px;padding-bottom: 7px;margin-bottom: 7px"> {} </div> 排名:{} <br> 國家地區:{} <br> 加權總分:{} <br> 國際學生:{} <br> 國際教師:{} <br> 師生比例:{} <br> 學術聲譽:{} <br> 雇主聲譽:{} <br> 教員參考率:{} <br> """ t_data = df[(df.year==2021) & (df['rank']<=100)] t_data = t_data.sort_values(by="total_score" , ascending=True) 軟體、解答、原始碼、教程可以加Q群:832157862免費獲取~ university, score = [], [] for idx, row in t_data.iterrows(): tjs = tool_js.format(row['university'], row['rank'], row['country'],row['total_score'], row['score_6'],row['score_5'], row['score_3'],row['score_1'],row['score_2'], row['score_4']) if row['country'] == 'China': university.append('???? {}'.format(re.sub('(.*?)', '',row['university']))) else: university.append(re.sub('(.*?)', '',row['university'])) score.append(opts.BarItem(name='', value=https://www.cnblogs.com/Qqun261823976/archive/2022/07/11/row['total_score'], tooltip_opts=opts.TooltipOpts(formatter=tjs)))
### TOP 100高校 篇幅有限,這邊只展示TOP100的高校,完整的榜單可以通過附件下載查看~ * 排名第一的大學是麻省理工,在單項上除了**國際學生**和**教員參考率**其余都是100分; * TOP4大學全部來自美國,除此之外是排名第五的牛津大學; * **國內排名最高的大學是清華大學,排名15**,其次是香港大學&北京大學;
bar = (Bar() .add_xaxis(university) .add_yaxis('', score, category_gap='30%') .set_global_opts(title_opts=opts.TitleOpts(title="2021年世界大學排名(QS) TOP 100", pos_left="center", title_textstyle_opts=opts.TextStyleOpts(font_size=20)), datazoom_opts=opts.DataZoomOpts(range_start=70, range_end=100, orient='vertical'), visualmap_opts=opts.VisualMapOpts(is_show=False, max_=100, min_=60, dimension=0, range_color=['#00FFFF', '#FF7F50']), legend_opts=opts.LegendOpts(is_show=False), xaxis_opts=opts.AxisOpts(is_show=False, is_scale=True), yaxis_opts=opts.AxisOpts(axistick_opts=opts.AxisTickOpts(is_show=False), axisline_opts=opts.AxisLineOpts(is_show=False), axislabel_opts=opts.LabelOpts(font_size=12))) .set_series_opts(label_opts=opts.LabelOpts(is_show=True, position='right', font_style='italic'), itemstyle_opts={"normal": { "barBorderRadius": [30, 30, 30, 30], 'shadowBlur': 10, 'shadowColor': 'rgba(120, 36, 50, 0.5)', 'shadowOffsetY': 5, } } ).reversal_axis()) grid = ( Grid(init_opts=opts.InitOpts(theme='purple-passion', width='1000px', height='1200px')) .add(bar, grid_opts=opts.GridOpts(pos_right='10%', pos_left='20%')) ) grid.render_notebook()
tool_js = """ <div style="border-bottom: 1px solid rgba(255,255,255,.3); font-size: 18px;padding-bottom: 7px;margin-bottom: 7px"> {} </div> 世界排名:{} <br> 國家地區:{} <br> 加權總分:{} <br> 國際學生:{} <br> 國際教師:{} <br> 師生比例:{} <br> 學術聲譽:{} <br> 雇主聲譽:{} <br> 教員參考率:{} <br> """ t_data = df[(df.country=='China') & (df['rank']<=500)] t_data = t_data.sort_values(by="total_score" , ascending=True) 軟體、解答、原始碼、教程可以加Q群:832157862免費獲取~ university, score = [], [] for idx, row in t_data.iterrows(): tjs = tool_js.format(row['university'], row['rank'], row['country'],row['total_score'], row['score_6'],row['score_5'], row['score_3'],row['score_1'],row['score_2'], row['score_4']) if row['country'] == 'China': university.append('???? {}'.format(re.sub('(.*?)', '',row['university']))) else: university.append(re.sub('(.*?)', '',row['university'])) score.append(opts.BarItem(name='', value=https://www.cnblogs.com/Qqun261823976/archive/2022/07/11/row['total_score'], tooltip_opts=opts.TooltipOpts(formatter=tjs)))
### 中國大學排名 因為在500名之后沒有具體的分值,所以這里只篩選了榜單TOP 500中的國內高校; * 在第一梯隊中,香港的高校占比很高,**TOP10中有4所來自香港**; * 刨除香港的高校,**TOP5高校分別是清華,北大,復旦,上交,浙大**;
bar = (Bar() .add_xaxis(university) .add_yaxis('', score, category_gap='30%') .set_global_opts(title_opts=opts.TitleOpts(title="TOP 500中的中國大學", pos_left="center", title_textstyle_opts=opts.TextStyleOpts(font_size=20)), datazoom_opts=opts.DataZoomOpts(range_start=50, range_end=100, orient='vertical'), visualmap_opts=opts.VisualMapOpts(is_show=False, max_=90, min_=20, dimension=0, range_color=['#00FFFF', '#FF7F50']), legend_opts=opts.LegendOpts(is_show=False), xaxis_opts=opts.AxisOpts(is_show=False, is_scale=True), yaxis_opts=opts.AxisOpts(axistick_opts=opts.AxisTickOpts(is_show=False), axisline_opts=opts.AxisLineOpts(is_show=False), axislabel_opts=opts.LabelOpts(font_size=12))) .set_series_opts(label_opts=opts.LabelOpts(is_show=True, position='right', font_style='italic'), itemstyle_opts={"normal": { "barBorderRadius": [30, 30, 30, 30], 'shadowBlur': 10, 'shadowColor': 'rgba(120, 36, 50, 0.5)', 'shadowOffsetY': 5, } } ).reversal_axis()) grid = ( Grid(init_opts=opts.InitOpts(theme='purple-passion', width='1000px', height='1200px')) .add(bar, grid_opts=opts.GridOpts(pos_right='10%', pos_left='20%')) ) grid.render_notebook()
### 按大洲分布 * TOP 1000高校中有**近40%是來自于歐洲**; * 非洲僅有11所高校上榜;
t_data = https://www.cnblogs.com/Qqun261823976/archive/2022/07/11/df[(df.year==2021) & (df['rank']<=1000)] t_data = t_data.groupby(['region'])['university'].count().reset_index() t_data.columns = ['region', 'num'] t_data = t_data.sort_values(by="num" , ascending=False) 軟體、解答、原始碼、教程可以加Q群:832157862免費獲取~ bar = (Bar(init_opts=opts.InitOpts(theme='purple-passion', width='1000px', height='600px')) .add_xaxis(t_data['region'].tolist()) .add_yaxis('出現次數', t_data['num'].tolist(), category_gap='50%') .set_global_opts(title_opts=opts.TitleOpts(title="TOP 1000高校按大洲分布", pos_left="center", title_textstyle_opts=opts.TextStyleOpts(font_size=20)), visualmap_opts=opts.VisualMapOpts(is_show=False, max_=300, min_=0, dimension=1, range_color=['#00FFFF', '#FF7F50']), legend_opts=opts.LegendOpts(is_show=False), xaxis_opts=opts.AxisOpts(axistick_opts=opts.AxisTickOpts(is_show=False), axisline_opts=opts.AxisLineOpts(is_show=False), axislabel_opts=opts.LabelOpts(font_size=15)), yaxis_opts=opts.AxisOpts(is_show=False)) .set_series_opts(label_opts=opts.LabelOpts(is_show=True, position='top', font_size=15, font_style='italic'), itemstyle_opts={"normal": { "barBorderRadius": [30, 30, 30, 30], 'shadowBlur': 10, 'shadowColor': 'rgba(120, 36, 50, 0.5)', 'shadowOffsetY': 5, } } )) bar.render_notebook()
軟體、解答、原始碼、教程可以加Q群:832157862免費獲取~
可視化效果(部分)
尾語 ??
感謝你觀看我的文章吶~本次航班到這里就結束啦 ??
希望本篇文章有對你帶來幫助 ??,有學習到一點知識~
躲起來的星星??也在努力發光,你也要努力加油(讓我們一起努力叭),
最后,博主要一下你們的三連呀(點贊、評論、收藏),不要錢的還是可以搞一搞的嘛~
不知道評論啥的,即使扣個6666也是對博主的鼓舞吖 ?? 感謝 ??
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/498588.html
標籤:其他
