本文的文字及圖片來源于網路,僅供學習、交流使用,不具有任何商業用途,著作權歸原作者所有,如有問題請及時聯系我們以作處理,
今天帶著大家用python分析一下美團上的按摩專案,看看老司機都喜歡玩什么服務,
本文主要分為兩部分:
- 一是資料爬取
- 二是資料可視化
一、爬取資料
1.抓包獲取資料介面
打開美團,搜索“按摩”關鍵詞,用火狐瀏覽器抓包
美團這個反爬很有意思,如果單純重繪網頁,資料顯示不全,“更多優惠”的內容不顯示,如下圖:
但是點擊智能排序等排序方式時就會顯示出來,如下圖:
資料介面url為:
https://apimobile.meituan.com/group/v4/poi/pcsearch/1?uuid=86d6fed675bd4d2eb544.1600266551.1.0.0&userid=-1&limit=32&offset=0&cateId=-1&q=按摩&sort=default
用limit和offfset控制頁碼,limit表示回傳資料條數,offfset代表資料回傳起點,經過分析,總共大概有1000條資料,
2.爬蟲爬取資料并保存為csv
經過測驗發現,該url連續請求會被服務器拒絕,于是只能設定間隔時間,并且limit可以更改,最大能設定到150,也就是說我們最多1次可獲取150條資料,代碼如下:
url='https://apimobile.meituan.com/group/v4/poi/pcsearch/1?uuid=86d6fed675bd4d2eb544.1600266551.1.0.0&userid=92855137&limit=150&offset=1050&cateId=-1&q=按摩&sort=default' response=requests.get(url, headers=headers) datas=json.loads(response.text)['data']['searchResult'] print(len(datas)) for data in datas: if data['deals']!=None: title=data['title'] #名字 address=data['address'] #地址 lowestprice=data['lowestprice'] #最低價 avgprice=data['avgprice'] #平均價格 latitude=data['latitude'] longitude = data['longitude'] avgscore = data['avgscore'] comments = data['comments'] historyCouponCount = data['historyCouponCount'] areaname = data['areaname'] backCateName = data['backCateName'] deals = data['deals'] for deal in deals: title_deal=deal['title'] #專案名字 price_deal = deal['price'] #專案團購價格 value_deal = deal['value'] #專案原價 sales_deal = deal['sales'] #專案已售次數 result=[title, address, lowestprice, avgprice,latitude,longitude,avgscore,comments, historyCouponCount,areaname,backCateName,title_deal,price_deal,value_deal,sales_deal] with open('1.csv', 'a+', newline='',encoding='gb18030') as f: f_csv = csv.writer(f) f_csv.writerow(result)
基本把有用的資料都保存了,資料展示如下:
二、資料可視化
先用pandas讀取資料
import pandas as pd data=pd.read_csv('美團按摩.csv',encoding='gb18030')
1.價格分布極坐標系
我們先來看看所有按摩專案價格的分布情況
price_dict={} for i in list(set(list(data['price_deal']))): price_dict[i]=list(data['price_deal']).count(i) prices=[(i,price_dict[i]) for i in list(price_dict.keys())] c = ( Polar({"theme": ThemeType.PURPLE_PASSION}) .add_schema(angleaxis_opts=opts.AngleAxisOpts(start_angle=0, min_=0,type_="value", is_clockwise=True)) .add("", prices, type_="scatter", label_opts=opts.LabelOpts(is_show=False)) .set_global_opts( tooltip_opts=opts.TooltipOpts(trigger="axis", axis_pointer_type="cross"), title_opts=opts.TitleOpts(title="按摩價格分布極坐標圖",pos_right='40%'), ) ) c.render_notebook()
不是很明顯,再對價格劃磁區間,看一下餅狀圖
price_range=['0-200元','200-500元','500-1000元','1000-2000元','2000元以上'] price_dict={} price_dict[price_range[0]]=len(list(data[data['price_deal']<=200]['price_deal'])) price_dict[price_range[1]]=len(list(data[data['price_deal']<=500]['price_deal']))-price_dict[price_range[0]] price_dict[price_range[2]]=len(list(data[data['price_deal']<=1000]['price_deal']))-price_dict[price_range[1]] price_dict[price_range[3]]=len(list(data[data['price_deal']<=2000]['price_deal']))-price_dict[price_range[2]] price_dict[price_range[4]]=len(list(data[data['price_deal']>=1000]['price_deal'])) from pyecharts import options as opts from pyecharts.charts import Pie pie = ( Pie() .add( "", [(i,price_dict[i])for i in list(price_dict.keys())], radius=["30%", "75%"], center=["50%", "50%"], rosetype="radius", label_opts=opts.LabelOpts(is_show=False), ) .set_global_opts(title_opts=opts.TitleOpts(title="Pie-玫瑰圖示例")) .set_series_opts(label_opts=opts.LabelOpts(formatter="{b}: {d}%")) ) pie.render_notebook()
從價格分布圖上來看,大概200元以內的專案占31.58%,200-500元以內的占15.59,500-1000元以內的占35.95%,1000-2000元以內的占16.22%,2000元以上0.86%,商家側重的價格區間在500-1000元,
2.價格最貴的20個專案
price_rank=data.sort_values(by='price_deal',ascending=False).head(20) bar = ( Bar(init_opts=opts.InitOpts(width=1200,height=900)) .add_xaxis(list(price_rank['title_deal'])[::-1]) .add_yaxis("", [float(i) for i in list(price_rank['price_deal'])[::-1]],color='#2F4F4F') .reversal_axis() .set_global_opts( title_opts=opts.TitleOpts("最貴按摩服務排行",pos_right='40%',pos_top='0%'), xaxis_opts=opts.AxisOpts( splitline_opts=opts.SplitLineOpts(is_show=True),name="價格"), yaxis_opts=opts.AxisOpts(splitline_opts=opts.SplitLineOpts(is_show=True), axislabel_opts=opts.LabelOpts(color='#FF6347'),name="專案"), ) .set_series_opts(label_opts=opts.LabelOpts(position="right",color='#006400')) ) grid = ( Grid() .add(bar, grid_opts=opts.GridOpts(pos_left="25%",pos_right="5%")) ) grid.render_notebook()
看起來都好高大上,你中意哪一個?
3.銷量最高的專案(老司機最喜歡的)
bar = ( Bar({"theme": ThemeType.DARK}) .add_xaxis(list(price_rank['title_deal'])[::-1]) .add_yaxis("", [float(i) for i in list(price_rank['sales_deal'])[::-1]]) .reversal_axis() .set_global_opts( title_opts=opts.TitleOpts("最受喜愛服務專案排行",pos_right='40%',pos_top='0%'), xaxis_opts=opts.AxisOpts( splitline_opts=opts.SplitLineOpts(is_show=True)), yaxis_opts=opts.AxisOpts(splitline_opts=opts.SplitLineOpts(is_show=True), axislabel_opts=opts.LabelOpts(color='#FFD700')),) .set_series_opts(label_opts=opts.LabelOpts(position="right",color='#FF1493')) ) grid = ( Grid({"theme": ThemeType.DARK}) .add(bar, grid_opts=opts.GridOpts(pos_left="25%",pos_right="0%")) ) grid.render_notebook()
這是什么鬼,九陽神功足療?明顯已經超出我的認知范圍了
初步觀察,應該是美團連鎖店,銷量都是共享的,或者是刷銷量的,咱也不懂,咱也不敢問
于是我決定對專案列進行去重,再把價格加上
price_rank=data_quchong.sort_values(by='sales_deal',ascending=False).head(10) bar = ( Bar({"theme": ThemeType.DARK}) .add_xaxis(['{0}: {1}元'.format(str(i),j) for i,j in zip(list(price_rank['title_deal'])[::-1],list(price_rank['price_deal'])[::-1])]) .add_yaxis("", [float(i) for i in list(price_rank['sales_deal'])[::-1]]) .reversal_axis() .set_global_opts( title_opts=opts.TitleOpts("最受喜愛服務專案排行",pos_right='40%',pos_top='0%'), xaxis_opts=opts.AxisOpts( splitline_opts=opts.SplitLineOpts(is_show=True)), yaxis_opts=opts.AxisOpts(splitline_opts=opts.SplitLineOpts(is_show=True), axislabel_opts=opts.LabelOpts(color='#FFD700')),) .set_series_opts(label_opts=opts.LabelOpts(position="right",color='#FF1493')) ) grid = ( Grid({"theme": ThemeType.DARK}) .add(bar, grid_opts=opts.GridOpts(pos_left="30%",pos_right="5%")) ) grid.render_notebook()
看的我激動的直搓手,泡腳足浴毫無懸念排第一,老司機的最愛,價格實惠效果好!
但是我就是想知道排第二的九陽神功足療到時是什么奇特的效果
于是我特意查了一下
感徑訓不錯哈!
以下文章來源于python資料分析之禪 ,作者小dull鳥
轉載地址
https://blog.csdn.net/fei347795790?t=1
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/139150.html
標籤:其他
