
文章目錄
- 前言
- 簡單需求分析
- 技術點
- 代碼實作
前言
前面兩篇內嵌在“我要偷偷學Python”系列里面了,等我抽出手來就去分離出來,
一直掛在別的系列下也不好,想了想,獨立出來吧,
簡單需求分析
爬取蟬媽媽上的熱門商品資訊,https://www.chanmama.com/



技術點
登錄后爬取頁面資訊,爬取程序中涉及到頁面的跳轉,對資料的清洗與存盤,
后面的就沒什么好說了,清洗與存盤在第二戰中展現,頁面跳轉,頁面跳轉,在講Xpath的時候有展現,這里再做一次,
登錄之后爬取頁面資訊倒是第一次做,
過了這個檻,可以玩的東西又多了很多,
廣闊天地,大有可為,
這里的代碼由我們“爬蟲百戰穿山甲”小分隊里的“垍”同學獨立完成,
非常之強,晚上六點的時候我把需求放到群里,晚上八點他就解決了,實實在在的驚呆了我們所有人,
代碼實作
來看看不同風格的代碼:
import requests,time
def cook(t):#取登陸前COOKIE
url="https://www.chanmama.com/login"
res = requests.get(url,headers=h)
w = res.text.find('https://hm.baidu.com/hm.js?')+27
c = res.text[w:w+32]
co = {'Hm_lvt_'+c: t, 'Hm_lpvt_'+c:t}
#print(w,c,co)
return(co)
def login(user,password):#登陸并手動更新COOKIE
url = "https://api-service.chanmama.com/v1/access/token"
d = '{"appId":10000,"timeStamp":'+t+',"username":"'+user+'","password":"'+password+'"}'
res = requests.post(url, data=d, headers=h, cookies=c)
data = res.json()
if data['errCode']==0:
c['LOGIN-TOKEN-FORSNS'] = data['data']['token']
h['Authorization'] = data['data']['token']
print('登陸成功~~')
return(True)
else:
print('登陸失敗~~')
return(False)
def pa(aa,s):#爬出原始碼
url = "https://api-service.chanmama.com/v1/product/search"
d = {"keyword":aa,"keyword_type":"","page":1,"price":"","size":s,"filter_coupon":0,"is_aweme_goods":0,"has_live":0,"has_video":0,"tb_max_commission_rate":"","day_pv_count":"","day_order_count":"","big_category":"","first_category":"","second_category":"","platform":"","sort":"day_order_count","order_by":"desc"}
res = requests.post(url,json = d,headers=h,cookies=c)
save(res.json())
def save(tt):#保存資訊
with open(r'在你自己的電腦上弄個檔案','w+',encoding='utf-8') as fo:
for i in tt['data']['list']:
print("商品:%s 價格:%s 原價:%s 昨日瀏覽:%s 昨日銷量:%s" %(i['title'],i['price'],i['market_price'],i['day_pv_count'],i['day_order_count']))
fo.write("商品:%s 價格:%s 原價:%s 昨日瀏覽:%s 昨日銷量:%s\n" %(i['title'],i['price'],i['market_price'],i['day_pv_count'],i['day_order_count']))
h = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"}
t = str(int(time.time()))
c = cook(t)
if login('填你的賬號','填你的密碼'):
pa('商品自己選',50)#非VIP只可以取前50條
#這里填入商品和要爬取的資料條數

強!!!!!
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/229870.html
標籤:python
