主頁 > 後端開發 > 全網最全python爬蟲系統進階學習(附原代碼)學完可就業

全網最全python爬蟲系統進階學習(附原代碼)學完可就業

2021-05-04 13:47:07 後端開發

5.2(第二天)

第一章 爬蟲介紹

1.認識爬蟲

第二章:requests實戰(基礎爬蟲)

1.豆瓣電影爬取
2.肯德基餐廳查詢
3.破解百度翻譯
4.搜狗首頁
5.網頁采集器
6.藥監總局相關資料爬取

第三章:爬蟲資料分析(bs4,xpath,正則運算式)

1.bs4決議基礎
2.bs4案例
3.xpath決議基礎
4.xpath決議案例-4k圖片決議爬取
5.xpath決議案例-58二手房
6.xpath決議案例-爬取站長素材中免費簡歷模板
7.xpath決議案例-全國城市名稱爬取
8.正則決議
9.正則決議-分頁爬取
10.爬取圖片

第四章:自動識別驗證碼

1.古詩文網驗證碼識別
fateadm_api.py(識別需要的配置,建議放在同一檔案夾下)
呼叫api介面在這里插入圖片描述

第五章:request模塊高級(模擬登錄)

1.代理操作
2.模擬登陸人人網
3.模擬登陸人人網
在這里插入圖片描述

第六章:高性能異步爬蟲(執行緒池,協程)

1.aiohttp實作多任務異步爬蟲
2.flask服務
3.多任務協程
4.多任務異步爬蟲
5.示例
6.同步爬蟲
7.執行緒池基本使用
8.執行緒池在爬蟲案例中的應用
9.協程

第七章:動態加載資料處理(selenium模塊應用,模擬登錄12306)

1.selenium基礎用法
2.selenium其他自動操作
3.12306登錄示例代碼
4.動作鏈與iframe的處理
5.谷歌無頭瀏覽器+反檢測
6.基于selenium實作1236模擬登錄
7.模擬登錄qq空間

第八章:scrapy框架

1.各種專案實戰,scrapy各種配置修改
在這里插入圖片描述

2.bossPro示例
3.bossPro示例
4.資料庫示例

第一章 爬蟲介紹

第0關 認識爬蟲
1、初始爬蟲
爬蟲,從本質上來說,就是利用程式在網上拿到對我們有價值的資料,
2、明晰路徑
2-1、瀏覽器作業原理

(1)決議資料:當服務器把資料回應給瀏覽器之后,瀏覽器并不會直接把資料丟給我們,因為這些資料是用計算機的語言寫的,瀏覽器還要把這些資料翻譯成我們能看得懂的內容;
(2)提取資料:我們就可以在拿到的資料中,挑選出對我們有用的資料;
(3)存盤資料:將挑選出來的有用資料保存在某一檔案/資料庫中,
2-2、爬蟲作業原理

(1)獲取資料:爬蟲程式會根據我們提供的網址,向服務器發起請求,然后回傳資料;
(2)決議資料:爬蟲程式會把服務器回傳的資料決議成我們能讀懂的格式;
(3)提取資料:爬蟲程式再從中提取出我們需要的資料;
(4)儲存資料:爬蟲程式把這些有用的資料保存起來,便于你日后的使用和分析,
————————————————
著作權宣告:本文為CSDN博主「yk 坤帝」的原創文章,遵循CC 4.0 BY-SA著作權協議,轉載請附上原文出處鏈接及本宣告,
原文鏈接:https://blog.csdn.net/qq_45803923/article/details/116133325

第二章:requests實戰(基礎爬蟲)

1.豆瓣電影爬取

import requests
import json
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36'
}
url = "https://movie.douban.com/j/chart/top_list"

params = {
    'type': '24',
    'interval_id': '100:90',
    'action': '',
    'start': '0',#從第幾部電影開始取
    'limit': '20'#一次取出的電影的個數
}
response = requests.get(url,params = params,headers = headers)
list_data = response.json()
fp = open('douban.json','w',encoding= 'utf-8')
json.dump(list_data,fp = fp,ensure_ascii= False)

print('over!!!!')

2.肯德基餐廳查詢

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36'
}
url = 'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword'
word = input('請輸入一個地址:')
params = {
    'cname': '',
    'pid': '',
    'keyword': word,
    'pageIndex': '1',
    'pageSize': '10'
}
response = requests.post(url,params = params ,headers = headers)
page_text = response.text
fileName = word + '.txt'
with open(fileName,'w',encoding= 'utf-8') as f:
    f.write(page_text)

3.破解百度翻譯

import requests
import json
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36'
}
post_url = 'https://fanyi.baidu.com/sug'
word = input('enter a word:')
data = {
    'kw':word
}
response = requests.post(url = post_url,data = data,headers = headers)
dic_obj = response.json()
fileName = word + '.json'
fp = open(fileName,'w',encoding= 'utf-8')

#ensure_ascii = False,中文不能用ascii代碼
json.dump(dic_obj,fp = fp,ensure_ascii = False)
print('over!')


4.搜狗首頁

import requests

url = 'https://www.sogou.com/?pid=sogou-site-d5da28d4865fb927'
response = requests.get(url)
page_text = response.text

print(page_text)
with open('./sougou.html','w',encoding= 'utf-8') as fp:
    fp.write(page_text)
print('爬取資料結束!!!')

5.網頁采集器

import requests
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36'
}

url = 'https://www.sogou.com/sogou'
kw = input('enter a word:')
param = {
    'query':kw
}
response = requests.get(url,params = param,headers = headers)

page_text = response.text
fileName = kw +'.html'

with open(fileName,'w',encoding= 'utf-8') as fp:
    fp.write(page_text)

print(fileName,'保存成功!!!')

6.藥監總局相關資料爬取

import requests
import json
url = "http://scxk.nmpa.gov.cn:81/xk/itownet/portalAction.do?method=getXkzsList"
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4385.0 Safari/537.36'
}
for page in range(1,6):
    page = str(page)
    data = {
        'on': 'true',
        'page': page,
        'pageSize': '15',
        'productName':'',
        'conditionType': '1',
        'applyname': '',
        'applysn':''
    }
    json_ids = requests.post(url,data = data,headers = headers).json()
    id_list = []
    for dic in json_ids['list']:
        id_list.append(dic['ID'])
    #print(id_list)

post_url = 'http://scxk.nmpa.gov.cn:81/xk/itownet/portalAction.do?method=getXkzsById'
all_data_list = []
for id in id_list:
    data = {
        'id':id
    }
    datail_json = requests.post(url = post_url,data = data,headers = headers).json()
    #print(datail_json,'---------------------over')
    all_data_list.append(datail_json)
    fp = open('allData.json','w',encoding='utf-8')
    json.dump(all_data_list,fp = fp,ensure_ascii= False)
print('over!!!')

第三章:爬蟲資料分析(bs4,xpath,正則運算式)

1.bs4決議基礎

from bs4 import BeautifulSoup

fp = open('第三章 資料分析/text.html','r',encoding='utf-8')
soup = BeautifulSoup(fp,'lxml')
#print(soup)
#print(soup.a)
#print(soup.div)
#print(soup.find('div'))
#print(soup.find('div',class_="song"))
#print(soup.find_all('a'))
#print(soup.select('.tang'))
#print(soup.select('.tang > ul > li >a')[0].text)
#print(soup.find('div',class_="song").text)
#print(soup.find('div',class_="song").string)
print(soup.select('.tang > ul > li >a')[0]['href'])

2.bs4案例

from bs4 import BeautifulSoup
import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36'
}
url = "http://sanguo.5000yan.com/"

page_text = requests.get(url ,headers = headers).content
#print(page_text)

soup = BeautifulSoup(page_text,'lxml')

li_list = soup.select('.list > ul > li')

fp = open('./sanguo.txt','w',encoding='utf-8')
for li in li_list:
    title = li.a.string
    #print(title)
    detail_url = 'http://sanguo.5000yan.com/'+li.a['href']
    print(detail_url)
    detail_page_text = requests.get(detail_url,headers = headers).content
    detail_soup = BeautifulSoup(detail_page_text,'lxml')
    div_tag = detail_soup.find('div',class_="grap")
    content = div_tag.text
    fp.write(title+":"+content+'\n')
    print(title,'爬取成功!!!')

3.xpath決議基礎

from lxml import etree

tree = etree.parse('第三章 資料分析/text.html')
# r = tree.xpath('/html/head/title')
# print(r)
# r = tree.xpath('/html/body/div')
# print(r)
# r = tree.xpath('/html//div')
# print(r)
# r = tree.xpath('//div')
# print(r)
# r = tree.xpath('//div[@class="song"]')
# print(r)
# r = tree.xpath('//div[@class="song"]/P[3]')
# print(r)
# r = tree.xpath('//div[@class="tang"]//li[5]/a/text()')
# print(r)
# r = tree.xpath('//li[7]/i/text()')
# print(r)
# r = tree.xpath('//li[7]//text()')
# print(r)
# r = tree.xpath('//div[@class="tang"]//text()')
# print(r)
# r = tree.xpath('//div[@class="song"]/img/@src')
# print(r)



4.xpath決議案例-4k圖片決議爬取

import requests
from lxml import etree
import os

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36'
}

url = 'http://pic.netbian.com/4kmeinv/'
response = requests.get(url,headers = headers)
#response.encoding=response.apparent_encoding
#response.encoding = 'utf-8'
page_text = response.text
tree = etree.HTML(page_text)

li_list = tree.xpath('//div[@class="slist"]/ul/li')

# if not os.path.exists('./picLibs'):
#     os.mkdir('./picLibs')
for li in li_list:
    img_src = 'http://pic.netbian.com/'+li.xpath('./a/img/@src')[0]
    img_name = li.xpath('./a/img/@alt')[0]+'.jpg'
    img_name = img_name.encode('iso-8859-1').decode('gbk')
    # print(img_name,img_src)
    # print(type(img_name))

    img_data = requests.get(url = img_src,headers = headers).content
    img_path ='picLibs/'+img_name
    #print(img_path)

    with open(img_path,'wb') as fp:

        fp.write(img_data)
        print(img_name,"下載成功")

5.xpath決議案例-58二手房

import requests
from lxml import etree

url = 'https://bj.58.com/ershoufang/p2/'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36'
}

page_text = requests.get(url=url,headers = headers).text

tree = etree.HTML(page_text)

li_list = tree.xpath('//section[@class="list-left"]/section[2]/div')

fp = open('58.txt','w',encoding='utf-8')
for li in li_list:
    title = li.xpath('./a/div[2]/div/div/h3/text()')[0]
    print(title)
    fp.write(title+'\n')
    

6.xpath決議案例-爬取站長素材中免費簡歷模板

import requests
from lxml import etree
import os

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36'
}

url = 'https://www.aqistudy.cn/historydata/'
page_text = requests.get(url,headers = headers).text

7.xpath決議案例-全國城市名稱爬取

import requests
from lxml import etree
import os

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36'
}

url = 'https://www.aqistudy.cn/historydata/'
page_text = requests.get(url,headers = headers).text

tree = etree.HTML(page_text)
# holt_li_list = tree.xpath('//div[@class="bottom"]/ul/li')

# all_city_name = []
# for li in holt_li_list:
#     host_city_name = li.xpath('./a/text()')[0]
#     all_city_name.append(host_city_name)

# city_name_list = tree.xpath('//div[@class="bottom"]/ul/div[2]/li')
# for li in city_name_list:
#     city_name = li.xpath('./a/text()')[0]
#     all_city_name.append(city_name)

# print(all_city_name,len(all_city_name))

#holt_li_list = tree.xpath('//div[@class="bottom"]/ul//li')
holt_li_list = tree.xpath('//div[@class="bottom"]/ul/li | //div[@class="bottom"]/ul/div[2]/li')
all_city_name = []
for li in holt_li_list:
    host_city_name = li.xpath('./a/text()')[0]
    all_city_name.append(host_city_name)
    print(all_city_name,len(all_city_name))


8.正則決議

import requests
import re
import os

if not os.path.exists('./qiutuLibs'):
    os.mkdir('./qiutuLibs')

url = 'https://www.qiushibaike.com/imgrank/'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4385.0 Safari/537.36'
}

page_text = requests.get(url,headers = headers).text


ex = '<div class="thumb">.*?<img src="(.*?)" alt.*?</div>'
img_src_list = re.findall(ex,page_text,re.S)
print(img_src_list)
for src in img_src_list:
    src = 'https:' + src

    img_data = requests.get(url = src,headers = headers).content
    img_name = src.split('/')[-1]
    imgPath = './qiutuLibs/'+img_name
    with open(imgPath,'wb') as fp:
        fp.write(img_data)
        print(img_name,"下載完成!!!!!")

9.正則決議-分頁爬取

import requests
import re
import os

if not os.path.exists('./qiutuLibs'):
    os.mkdir('./qiutuLibs')

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4385.0 Safari/537.36'
}

url = 'https://www.qiushibaike.com/imgrank/page/%d/'

for pageNum in range(1,3):
    new_url = format(url%pageNum)

    page_text = requests.get(new_url,headers = headers).text


    ex = '<div class="thumb">.*?<img src="(.*?)" alt.*?</div>'
    img_src_list = re.findall(ex,page_text,re.S)
    print(img_src_list)
    for src in img_src_list:
        src = 'https:' + src

        img_data = requests.get(url = src,headers = headers).content
        img_name = src.split('/')[-1]
        imgPath = './qiutuLibs/'+img_name
        with open(imgPath,'wb') as fp:
            fp.write(img_data)
            print(img_name,"下載完成!!!!!")

10.爬取圖片

import requests

url = 'https://pic.qiushibaike.com/system/pictures/12404/124047919/medium/R7Y2UOCDRBXF2MIQ.jpg'
img_data = requests.get(url).content

with open('qiutu.jpg','wb') as fp:
    fp.write(img_data)

第四章:自動識別驗證碼

1.古詩文網驗證碼識別

開發者賬號密碼可以申請

import requests
from lxml import etree
from fateadm_api import FateadmApi

def TestFunc(imgPath,codyType):
    pd_id           = "xxxxxx"     #用戶中心頁可以查詢到pd資訊
    pd_key          = "xxxxxxxx"
    app_id          = "xxxxxxx"     #開發者分成用的賬號,在開發者中心可以查詢到
    app_key         = "xxxxxxx"
    #識別型別,
    #具體型別可以查看官方網站的價格頁選擇具體的型別,不清楚型別的,可以咨詢客服
    pred_type       = codyType
    api             = FateadmApi(app_id, app_key, pd_id, pd_key)
    # 查詢余額
    balance 		= api.QueryBalcExtend()   # 直接返余額
    # api.QueryBalc()

    # 通過檔案形式識別:
    file_name       = imgPath
    # 多網站型別時,需要增加src_url引數,具體請參考api檔案: http://docs.fateadm.com/web/#/1?page_id=6
    result =  api.PredictFromFileExtend(pred_type,file_name)   # 直接回傳識別結果
    #rsp             = api.PredictFromFile(pred_type, file_name)  # 回傳詳細識別結果

    '''
    # 如果不是通過檔案識別,則呼叫Predict介面:
    # result 			= api.PredictExtend(pred_type,data)   	# 直接回傳識別結果
    rsp             = api.Predict(pred_type,data)				# 回傳詳細的識別結果
    '''

    # just_flag    = False
    # if just_flag :
    #     if rsp.ret_code == 0:
    #         #識別的結果如果與預期不符,可以呼叫這個介面將預期不符的訂單退款
    #         # 退款僅在正常識別出結果后,無法通過網站驗證的情況,請勿非法或者濫用,否則可能進行封號處理
    #         api.Justice( rsp.request_id)

    #card_id         = "123"
    #card_key        = "123"
    #充值
    #api.Charge(card_id, card_key)
    #LOG("print in testfunc")
    return result

# if __name__ == "__main__":
#     TestFunc()


headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36'
}
url = 'https://so.gushiwen.cn/user/login.aspx?from=http://so.gushiwen.cn/user/collect.aspx'

page_text = requests.get(url,headers = headers).text
tree = etree.HTML(page_text)

code_img_src = 'https://so.gushiwen.cn' + tree.xpath('//*[@id="imgCode"]/@src')[0]
img_data = requests.get(code_img_src,headers = headers).content

with open('./code.jpg','wb') as fp:
    fp.write(img_data)

code_text = TestFunc('code.jpg',30400)
print('識別結果為:' + code_text)

code_text = TestFunc('code.jpg',30400)
print('識別結果為:' + code_text)

fateadm_api.py(識別需要的配置,建議放在同一檔案夾下)
呼叫api介面

# coding=utf-8
import os,sys
import hashlib
import time
import json
import requests

FATEA_PRED_URL  = "http://pred.fateadm.com"

def LOG(log):
    # 不需要測驗時,注釋掉日志就可以了
    print(log)
    log = None

class TmpObj():
    def __init__(self):
        self.value  = None

class Rsp():
    def __init__(self):
        self.ret_code   = -1
        self.cust_val   = 0.0
        self.err_msg    = "succ"
        self.pred_rsp   = TmpObj()

    def ParseJsonRsp(self, rsp_data):
        if rsp_data is None:
            self.err_msg     = "http request failed, get rsp Nil data"
            return
        jrsp                = json.loads( rsp_data)
        self.ret_code       = int(jrsp["RetCode"])
        self.err_msg        = jrsp["ErrMsg"]
        self.request_id     = jrsp["RequestId"]
        if self.ret_code == 0:
            rslt_data   = jrsp["RspData"]
            if rslt_data is not None and rslt_data != "":
                jrsp_ext    = json.loads( rslt_data)
                if "cust_val" in jrsp_ext:
                    data        = jrsp_ext["cust_val"]
                    self.cust_val   = float(data)
                if "result" in jrsp_ext:
                    data        = jrsp_ext["result"]
                    self.pred_rsp.value     = data

def CalcSign(pd_id, passwd, timestamp):
    md5     = hashlib.md5()
    md5.update((timestamp + passwd).encode())
    csign   = md5.hexdigest()

    md5     = hashlib.md5()
    md5.update((pd_id + timestamp + csign).encode())
    csign   = md5.hexdigest()
    return csign

def CalcCardSign(cardid, cardkey, timestamp, passwd):
    md5     = hashlib.md5()
    md5.update(passwd + timestamp + cardid + cardkey)
    return md5.hexdigest()

def HttpRequest(url, body_data, img_data=""):
    rsp         = Rsp()
    post_data   = body_data
    files       = {
        'img_data':('img_data',img_data)
    }
    header      = {
            'User-Agent': 'Mozilla/5.0',
            }
    rsp_data    = requests.post(url, post_data,files=files ,headers=header)
    rsp.ParseJsonRsp( rsp_data.text)
    return rsp

class FateadmApi():
    # API介面呼叫類
    # 引數(appID,appKey,pdID,pdKey)
    def __init__(self, app_id, app_key, pd_id, pd_key):
        self.app_id     = app_id
        if app_id is None:
            self.app_id = ""
        self.app_key    = app_key
        self.pd_id      = pd_id
        self.pd_key     = pd_key
        self.host       = FATEA_PRED_URL

    def SetHost(self, url):
        self.host       = url

    #
    # 查詢余額
    # 引數:無
    # 回傳值:
    #   rsp.ret_code:正常回傳0
    #   rsp.cust_val:用戶余額
    #   rsp.err_msg:例外時回傳例外詳情
    #
    def QueryBalc(self):
        tm      = str( int(time.time()))
        sign    = CalcSign( self.pd_id, self.pd_key, tm)
        param   = {
                "user_id": self.pd_id,
                "timestamp":tm,
                "sign":sign
                }
        url     = self.host + "/api/custval"
        rsp     = HttpRequest(url, param)
        if rsp.ret_code == 0:
            LOG("query succ ret: {} cust_val: {} rsp: {} pred: {}".format( rsp.ret_code, rsp.cust_val, rsp.err_msg, rsp.pred_rsp.value))
        else:
            LOG("query failed ret: {} err: {}".format( rsp.ret_code, rsp.err_msg.encode('utf-8')))
        return rsp

    #
    # 查詢網路延遲
    # 引數:pred_type:識別型別
    # 回傳值:
    #   rsp.ret_code:正常回傳0
    #   rsp.err_msg: 例外時回傳例外詳情
    #
    def QueryTTS(self, pred_type):
        tm          = str( int(time.time()))
        sign        = CalcSign( self.pd_id, self.pd_key, tm)
        param       = {
                "user_id": self.pd_id,
                "timestamp":tm,
                "sign":sign,
                "predict_type":pred_type,
                }
        if self.app_id != "":
            #
            asign       = CalcSign(self.app_id, self.app_key, tm)
            param["appid"]     = self.app_id
            param["asign"]      = asign
        url     = self.host + "/api/qcrtt"
        rsp     = HttpRequest(url, param)
        if rsp.ret_code == 0:
            LOG("query rtt succ ret: {} request_id: {} err: {}".format( rsp.ret_code, rsp.request_id, rsp.err_msg))
        else:
            LOG("predict failed ret: {} err: {}".format( rsp.ret_code, rsp.err_msg.encode('utf-8')))
        return rsp

    #
    # 識別驗證碼
    # 引數:pred_type:識別型別  img_data:圖片的資料
    # 回傳值:
    #   rsp.ret_code:正常回傳0
    #   rsp.request_id:唯一訂單號
    #   rsp.pred_rsp.value:識別結果
    #   rsp.err_msg:例外時回傳例外詳情
    #
    def Predict(self, pred_type, img_data, head_info = ""):
        tm          = str( int(time.time()))
        sign        = CalcSign( self.pd_id, self.pd_key, tm)
        param       = {
                "user_id": self.pd_id,
                "timestamp": tm,
                "sign": sign,
                "predict_type": pred_type,
                "up_type": "mt"
                }
        if head_info is not None or head_info != "":
            param["head_info"] = head_info
        if self.app_id != "":
            #
            asign       = CalcSign(self.app_id, self.app_key, tm)
            param["appid"]     = self.app_id
            param["asign"]      = asign
        url     = self.host + "/api/capreg"
        files = img_data
        rsp     = HttpRequest(url, param, files)
        if rsp.ret_code == 0:
            LOG("predict succ ret: {} request_id: {} pred: {} err: {}".format( rsp.ret_code, rsp.request_id, rsp.pred_rsp.value, rsp.err_msg))
        else:
            LOG("predict failed ret: {} err: {}".format( rsp.ret_code, rsp.err_msg))
            if rsp.ret_code == 4003:
                #lack of money
                LOG("cust_val <= 0 lack of money, please charge immediately")
        return rsp

    #
    # 從檔案進行驗證碼識別
    # 引數:pred_type;識別型別  file_name:檔案名
    # 回傳值:
    #   rsp.ret_code:正常回傳0
    #   rsp.request_id:唯一訂單號
    #   rsp.pred_rsp.value:識別結果
    #   rsp.err_msg:例外時回傳例外詳情
    #
    def PredictFromFile( self, pred_type, file_name, head_info = ""):
        with open(file_name, "rb") as f:
            data = f.read()
        return self.Predict(pred_type,data,head_info=head_info)

    #
    # 識別失敗,進行退款請求
    # 引數:request_id:需要退款的訂單號
    # 回傳值:
    #   rsp.ret_code:正常回傳0
    #   rsp.err_msg:例外時回傳例外詳情
    #
    # 注意:
    #    Predict識別介面,僅在ret_code == 0時才會進行扣款,才需要進行退款請求,否則無需進行退款操作
    # 注意2:
    #   退款僅在正常識別出結果后,無法通過網站驗證的情況,請勿非法或者濫用,否則可能進行封號處理
    #
    def Justice(self, request_id):
        if request_id == "":
            #
            return
        tm          = str( int(time.time()))
        sign        = CalcSign( self.pd_id, self.pd_key, tm)
        param       = {
                "user_id": self.pd_id,
                "timestamp":tm,
                "sign":sign,
                "request_id":request_id
                }
        url     = self.host + "/api/capjust"
        rsp     = HttpRequest(url, param)
        if rsp.ret_code == 0:
            LOG("justice succ ret: {} request_id: {} pred: {} err: {}".format( rsp.ret_code, rsp.request_id, rsp.pred_rsp.value, rsp.err_msg))
        else:
            LOG("justice failed ret: {} err: {}".format( rsp.ret_code, rsp.err_msg.encode('utf-8')))
        return rsp

    #
    # 充值介面
    # 引數:cardid:充值卡號  cardkey:充值卡簽名串
    # 回傳值:
    #   rsp.ret_code:正常回傳0
    #   rsp.err_msg:例外時回傳例外詳情
    #
    def Charge(self, cardid, cardkey):
        tm          = str( int(time.time()))
        sign        = CalcSign( self.pd_id, self.pd_key, tm)
        csign       = CalcCardSign(cardid, cardkey, tm, self.pd_key)
        param       = {
                "user_id": self.pd_id,
                "timestamp":tm,
                "sign":sign,
                'cardid':cardid,
                'csign':csign
                }
        url     = self.host + "/api/charge"
        rsp     = HttpRequest(url, param)
        if rsp.ret_code == 0:
            LOG("charge succ ret: {} request_id: {} pred: {} err: {}".format( rsp.ret_code, rsp.request_id, rsp.pred_rsp.value, rsp.err_msg))
        else:
            LOG("charge failed ret: {} err: {}".format( rsp.ret_code, rsp.err_msg.encode('utf-8')))
        return rsp

    ##
    # 充值,只回傳是否成功
    # 引數:cardid:充值卡號  cardkey:充值卡簽名串
    # 回傳值: 充值成功時回傳0
    ##
    def ExtendCharge(self, cardid, cardkey):
        return self.Charge(cardid,cardkey).ret_code

    ##
    # 呼叫退款,只回傳是否成功
    # 引數: request_id:需要退款的訂單號
    # 回傳值: 退款成功時回傳0
    #
    # 注意:
    #    Predict識別介面,僅在ret_code == 0時才會進行扣款,才需要進行退款請求,否則無需進行退款操作
    # 注意2:
    #   退款僅在正常識別出結果后,無法通過網站驗證的情況,請勿非法或者濫用,否則可能進行封號處理
    ##
    def JusticeExtend(self, request_id):
        return self.Justice(request_id).ret_code

    ##
    # 查詢余額,只回傳余額
    # 引數:無
    # 回傳值:rsp.cust_val:余額
    ##
    def QueryBalcExtend(self):
        rsp = self.QueryBalc()
        return rsp.cust_val

    ##
    # 從檔案識別驗證碼,只回傳識別結果
    # 引數:pred_type;識別型別  file_name:檔案名
    # 回傳值: rsp.pred_rsp.value:識別的結果
    ##
    def PredictFromFileExtend( self, pred_type, file_name, head_info = ""):
        rsp = self.PredictFromFile(pred_type,file_name,head_info)
        return rsp.pred_rsp.value

    ##
    # 識別介面,只回傳識別結果
    # 引數:pred_type:識別型別  img_data:圖片的資料
    # 回傳值: rsp.pred_rsp.value:識別的結果
    ##
    def PredictExtend(self,pred_type, img_data, head_info = ""):
        rsp = self.Predict(pred_type,img_data,head_info)
        return rsp.pred_rsp.value



def TestFunc():
    pd_id           = "128292"     #用戶中心頁可以查詢到pd資訊
    pd_key          = "bASHdc/12ISJOX7pV3qhPr2ntQ6QcEkV"
    app_id          = "100001"     #開發者分成用的賬號,在開發者中心可以查詢到
    app_key         = "123456"
    #識別型別,
    #具體型別可以查看官方網站的價格頁選擇具體的型別,不清楚型別的,可以咨詢客服
    pred_type       = "30400"
    api             = FateadmApi(app_id, app_key, pd_id, pd_key)
    # 查詢余額
    balance 		= api.QueryBalcExtend()   # 直接返余額
    # api.QueryBalc()

    # 通過檔案形式識別:
    file_name       = 'img.gif'
    # 多網站型別時,需要增加src_url引數,具體請參考api檔案: http://docs.fateadm.com/web/#/1?page_id=6
    # result =  api.PredictFromFileExtend(pred_type,file_name)   # 直接回傳識別結果
    rsp             = api.PredictFromFile(pred_type, file_name)  # 回傳詳細識別結果

    '''
    # 如果不是通過檔案識別,則呼叫Predict介面:
    # result 			= api.PredictExtend(pred_type,data)   	# 直接回傳識別結果
    rsp             = api.Predict(pred_type,data)				# 回傳詳細的識別結果
    '''

    just_flag    = False
    if just_flag :
        if rsp.ret_code == 0:
            #識別的結果如果與預期不符,可以呼叫這個介面將預期不符的訂單退款
            # 退款僅在正常識別出結果后,無法通過網站驗證的情況,請勿非法或者濫用,否則可能進行封號處理
            api.Justice( rsp.request_id)

    #card_id         = "123"
    #card_key        = "123"
    #充值
    #api.Charge(card_id, card_key)
    LOG("print in testfunc")

if __name__ == "__main__":
    TestFunc()



第五章:request模塊高級(模擬登錄)

1.代理操作

import requests


headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36'
}
url = 'https://www.sogou.com/sie?query=ip'

page_text = requests.get(url,headers = headers,proxies = {"https":"183.166.103.86:9999"}).text

with open('ip.html','w',encoding='utf-8') as fp:

    fp.write(page_text)

2.模擬登陸人人網

import requests
from lxml import etree
from fateadm_api import FateadmApi


def TestFunc(imgPath,codyType):
    pd_id           = "xxxxx"     #用戶中心頁可以查詢到pd資訊
    pd_key          = "xxxxxxxxxxxxxxxxxx"
    app_id          = "xxxxxxxx"     #開發者分成用的賬號,在開發者中心可以查詢到
    app_key         = "xxxxxx"
    #識別型別,
    #具體型別可以查看官方網站的價格頁選擇具體的型別,不清楚型別的,可以咨詢客服
    pred_type       = codyType
    api             = FateadmApi(app_id, app_key, pd_id, pd_key)
    # 查詢余額
    balance 		= api.QueryBalcExtend()   # 直接返余額
    # api.QueryBalc()

    # 通過檔案形式識別:
    file_name       = imgPath
    # 多網站型別時,需要增加src_url引數,具體請參考api檔案: http://docs.fateadm.com/web/#/1?page_id=6
    result =  api.PredictFromFileExtend(pred_type,file_name)   # 直接回傳識別結果
    #rsp             = api.PredictFromFile(pred_type, file_name)  # 回傳詳細識別結果

    '''
    # 如果不是通過檔案識別,則呼叫Predict介面:
    # result 			= api.PredictExtend(pred_type,data)   	# 直接回傳識別結果
    rsp             = api.Predict(pred_type,data)				# 回傳詳細的識別結果
    '''

    # just_flag    = False
    # if just_flag :
    #     if rsp.ret_code == 0:
    #         #識別的結果如果與預期不符,可以呼叫這個介面將預期不符的訂單退款
    #         # 退款僅在正常識別出結果后,無法通過網站驗證的情況,請勿非法或者濫用,否則可能進行封號處理
    #         api.Justice( rsp.request_id)

    #card_id         = "123"
    #card_key        = "123"
    #充值
    #api.Charge(card_id, card_key)
    #LOG("print in testfunc")
    return result

# if __name__ == "__main__":
#     TestFunc()



headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36'
}
url = 'http://www.renren.com/'
page_text = requests.get(url,headers = headers).text

tree = etree.HTML(page_text)
code_img_src = tree.xpath('//*[@id="verifyPic_login"]/@src')[0]

code_img_data = requests.get(code_img_src,headers = headers).content

with open('./code.jpg','wb') as fp:
    fp.write(code_img_data)

result = TestFunc('code.jpg',30600)
print('識別結果為:' + result)

login_url = 'http://www.renren.com/ajaxLogin/login?1=1&uniqueTimestamp=2021121720536'
data = {
    'email':'xxxxxxxx',
    'icode': result,
    'origURL': 'http://www.renren.com/home',
    'domain': 'renren.com',
    'key_id': '1',
    'captcha_type':' web_login',
    'password': '47e27dd5ef32b31041ebf56ec85a9b1e4233875e36396241c88245b188c56cdb',
    'rkey': 'c655ef0c57a72755f1240d6c0efac67d',
    'f': ''
}

response = requests.post(login_url,headers = headers, data = data)
print(response.status_code)


with open('renren.html','w',encoding= 'utf-8') as fp:
    fp.write(response.text)

fateadm_api.py

# coding=utf-8
import os,sys
import hashlib
import time
import json
import requests

FATEA_PRED_URL  = "http://pred.fateadm.com"

def LOG(log):
    # 不需要測驗時,注釋掉日志就可以了
    print(log)
    log = None

class TmpObj():
    def __init__(self):
        self.value  = None

class Rsp():
    def __init__(self):
        self.ret_code   = -1
        self.cust_val   = 0.0
        self.err_msg    = "succ"
        self.pred_rsp   = TmpObj()

    def ParseJsonRsp(self, rsp_data):
        if rsp_data is None:
            self.err_msg     = "http request failed, get rsp Nil data"
            return
        jrsp                = json.loads( rsp_data)
        self.ret_code       = int(jrsp["RetCode"])
        self.err_msg        = jrsp["ErrMsg"]
        self.request_id     = jrsp["RequestId"]
        if self.ret_code == 0:
            rslt_data   = jrsp["RspData"]
            if rslt_data is not None and rslt_data != "":
                jrsp_ext    = json.loads( rslt_data)
                if "cust_val" in jrsp_ext:
                    data        = jrsp_ext["cust_val"]
                    self.cust_val   = float(data)
                if "result" in jrsp_ext:
                    data        = jrsp_ext["result"]
                    self.pred_rsp.value     = data

def CalcSign(pd_id, passwd, timestamp):
    md5     = hashlib.md5()
    md5.update((timestamp + passwd).encode())
    csign   = md5.hexdigest()

    md5     = hashlib.md5()
    md5.update((pd_id + timestamp + csign).encode())
    csign   = md5.hexdigest()
    return csign

def CalcCardSign(cardid, cardkey, timestamp, passwd):
    md5     = hashlib.md5()
    md5.update(passwd + timestamp + cardid + cardkey)
    return md5.hexdigest()

def HttpRequest(url, body_data, img_data=""):
    rsp         = Rsp()
    post_data   = body_data
    files       = {
        'img_data':('img_data',img_data)
    }
    header      = {
            'User-Agent': 'Mozilla/5.0',
            }
    rsp_data    = requests.post(url, post_data,files=files ,headers=header)
    rsp.ParseJsonRsp( rsp_data.text)
    return rsp

class FateadmApi():
    # API介面呼叫類
    # 引數(appID,appKey,pdID,pdKey)
    def __init__(self, app_id, app_key, pd_id, pd_key):
        self.app_id     = app_id
        if app_id is None:
            self.app_id = ""
        self.app_key    = app_key
        self.pd_id      = pd_id
        self.pd_key     = pd_key
        self.host       = FATEA_PRED_URL

    def SetHost(self, url):
        self.host       = url

    #
    # 查詢余額
    # 引數:無
    # 回傳值:
    #   rsp.ret_code:正常回傳0
    #   rsp.cust_val:用戶余額
    #   rsp.err_msg:例外時回傳例外詳情
    #
    def QueryBalc(self):
        tm      = str( int(time.time()))
        sign    = CalcSign( self.pd_id, self.pd_key, tm)
        param   = {
                "user_id": self.pd_id,
                "timestamp":tm,
                "sign":sign
                }
        url     = self.host + "/api/custval"
        rsp     = HttpRequest(url, param)
        if rsp.ret_code == 0:
            LOG("query succ ret: {} cust_val: {} rsp: {} pred: {}".format( rsp.ret_code, rsp.cust_val, rsp.err_msg, rsp.pred_rsp.value))
        else:
            LOG("query failed ret: {} err: {}".format( rsp.ret_code, rsp.err_msg.encode('utf-8')))
        return rsp

    #
    # 查詢網路延遲
    # 引數:pred_type:識別型別
    # 回傳值:
    #   rsp.ret_code:正常回傳0
    #   rsp.err_msg: 例外時回傳例外詳情
    #
    def QueryTTS(self, pred_type):
        tm          = str( int(time.time()))
        sign        = CalcSign( self.pd_id, self.pd_key, tm)
        param       = {
                "user_id": self.pd_id,
                "timestamp":tm,
                "sign":sign,
                "predict_type":pred_type,
                }
        if self.app_id != "":
            #
            asign       = CalcSign(self.app_id, self.app_key, tm)
            param["appid"]     = self.app_id
            param["asign"]      = asign
        url     = self.host + "/api/qcrtt"
        rsp     = HttpRequest(url, param)
        if rsp.ret_code == 0:
            LOG("query rtt succ ret: {} request_id: {} err: {}".format( rsp.ret_code, rsp.request_id, rsp.err_msg))
        else:
            LOG("predict failed ret: {} err: {}".format( rsp.ret_code, rsp.err_msg.encode('utf-8')))
        return rsp

    #
    # 識別驗證碼
    # 引數:pred_type:識別型別  img_data:圖片的資料
    # 回傳值:
    #   rsp.ret_code:正常回傳0
    #   rsp.request_id:唯一訂單號
    #   rsp.pred_rsp.value:識別結果
    #   rsp.err_msg:例外時回傳例外詳情
    #
    def Predict(self, pred_type, img_data, head_info = ""):
        tm          = str( int(time.time()))
        sign        = CalcSign( self.pd_id, self.pd_key, tm)
        param       = {
                "user_id": self.pd_id,
                "timestamp": tm,
                "sign": sign,
                "predict_type": pred_type,
                "up_type": "mt"
                }
        if head_info is not None or head_info != "":
            param["head_info"] = head_info
        if self.app_id != "":
            #
            asign       = CalcSign(self.app_id, self.app_key, tm)
            param["appid"]     = self.app_id
            param["asign"]      = asign
        url     = self.host + "/api/capreg"
        files = img_data
        rsp     = HttpRequest(url, param, files)
        if rsp.ret_code == 0:
            LOG("predict succ ret: {} request_id: {} pred: {} err: {}".format( rsp.ret_code, rsp.request_id, rsp.pred_rsp.value, rsp.err_msg))
        else:
            LOG("predict failed ret: {} err: {}".format( rsp.ret_code, rsp.err_msg))
            if rsp.ret_code == 4003:
                #lack of money
                LOG("cust_val <= 0 lack of money, please charge immediately")
        return rsp

    #
    # 從檔案進行驗證碼識別
    # 引數:pred_type;識別型別  file_name:檔案名
    # 回傳值:
    #   rsp.ret_code:正常回傳0
    #   rsp.request_id:唯一訂單號
    #   rsp.pred_rsp.value:識別結果
    #   rsp.err_msg:例外時回傳例外詳情
    #
    def PredictFromFile( self, pred_type, file_name, head_info = ""):
        with open(file_name, "rb") as f:
            data = f.read()
        return self.Predict(pred_type,data,head_info=head_info)

    #
    # 識別失敗,進行退款請求
    # 引數:request_id:需要退款的訂單號
    # 回傳值:
    #   rsp.ret_code:正常回傳0
    #   rsp.err_msg:例外時回傳例外詳情
    #
    # 注意:
    #    Predict識別介面,僅在ret_code == 0時才會進行扣款,才需要進行退款請求,否則無需進行退款操作
    # 注意2:
    #   退款僅在正常識別出結果后,無法通過網站驗證的情況,請勿非法或者濫用,否則可能進行封號處理
    #
    def Justice(self, request_id):
        if request_id == "":
            #
            return
        tm          = str( int(time.time()))
        sign        = CalcSign( self.pd_id, self.pd_key, tm)
        param       = {
                "user_id": self.pd_id,
                "timestamp":tm,
                "sign":sign,
                "request_id":request_id
                }
        url     = self.host + "/api/capjust"
        rsp     = HttpRequest(url, param)
        if rsp.ret_code == 0:
            LOG("justice succ ret: {} request_id: {} pred: {} err: {}".format( rsp.ret_code, rsp.request_id, rsp.pred_rsp.value, rsp.err_msg))
        else:
            LOG("justice failed ret: {} err: {}".format( rsp.ret_code, rsp.err_msg.encode('utf-8')))
        return rsp

    #
    # 充值介面
    # 引數:cardid:充值卡號  cardkey:充值卡簽名串
    # 回傳值:
    #   rsp.ret_code:正常回傳0
    #   rsp.err_msg:例外時回傳例外詳情
    #
    def Charge(self, cardid, cardkey):
        tm          = str( int(time.time()))
        sign        = CalcSign( self.pd_id, self.pd_key, tm)
        csign       = CalcCardSign(cardid, cardkey, tm, self.pd_key)
        param       = {
                "user_id": self.pd_id,
                "timestamp":tm,
                "sign":sign,
                'cardid':cardid,
                'csign':csign
                }
        url     = self.host + "/api/charge"
        rsp     = HttpRequest(url, param)
        if rsp.ret_code == 0:
            LOG("charge succ ret: {} request_id: {} pred: {} err: {}".format( rsp.ret_code, rsp.request_id, rsp.pred_rsp.value, rsp.err_msg))
        else:
            LOG("charge failed ret: {} err: {}".format( rsp.ret_code, rsp.err_msg.encode('utf-8')))
        return rsp

    ##
    # 充值,只回傳是否成功
    # 引數:cardid:充值卡號  cardkey:充值卡簽名串
    # 回傳值: 充值成功時回傳0
    ##
    def ExtendCharge(self, cardid, cardkey):
        return self.Charge(cardid,cardkey).ret_code

    ##
    # 呼叫退款,只回傳是否成功
    # 引數: request_id:需要退款的訂單號
    # 回傳值: 退款成功時回傳0
    #
    # 注意:
    #    Predict識別介面,僅在ret_code == 0時才會進行扣款,才需要進行退款請求,否則無需進行退款操作
    # 注意2:
    #   退款僅在正常識別出結果后,無法通過網站驗證的情況,請勿非法或者濫用,否則可能進行封號處理
    ##
    def JusticeExtend(self, request_id):
        return self.Justice(request_id).ret_code

    ##
    # 查詢余額,只回傳余額
    # 引數:無
    # 回傳值:rsp.cust_val:余額
    ##
    def QueryBalcExtend(self):
        rsp = self.QueryBalc()
        return rsp.cust_val

    ##
    # 從檔案識別驗證碼,只回傳識別結果
    # 引數:pred_type;識別型別  file_name:檔案名
    # 回傳值: rsp.pred_rsp.value:識別的結果
    ##
    def PredictFromFileExtend( self, pred_type, file_name, head_info = ""):
        rsp = self.PredictFromFile(pred_type,file_name,head_info)
        return rsp.pred_rsp.value

    ##
    # 識別介面,只回傳識別結果
    # 引數:pred_type:識別型別  img_data:圖片的資料
    # 回傳值: rsp.pred_rsp.value:識別的結果
    ##
    def PredictExtend(self,pred_type, img_data, head_info = ""):
        rsp = self.Predict(pred_type,img_data,head_info)
        return rsp.pred_rsp.value



def TestFunc():
    pd_id           = "128292"     #用戶中心頁可以查詢到pd資訊
    pd_key          = "bASHdc/12ISJOX7pV3qhPr2ntQ6QcEkV"
    app_id          = "100001"     #開發者分成用的賬號,在開發者中心可以查詢到
    app_key         = "123456"
    #識別型別,
    #具體型別可以查看官方網站的價格頁選擇具體的型別,不清楚型別的,可以咨詢客服
    pred_type       = "30400"
    api             = FateadmApi(app_id, app_key, pd_id, pd_key)
    # 查詢余額
    balance 		= api.QueryBalcExtend()   # 直接返余額
    # api.QueryBalc()

    # 通過檔案形式識別:
    file_name       = 'img.gif'
    # 多網站型別時,需要增加src_url引數,具體請參考api檔案: http://docs.fateadm.com/web/#/1?page_id=6
    # result =  api.PredictFromFileExtend(pred_type,file_name)   # 直接回傳識別結果
    rsp             = api.PredictFromFile(pred_type, file_name)  # 回傳詳細識別結果

    '''
    # 如果不是通過檔案識別,則呼叫Predict介面:
    # result 			= api.PredictExtend(pred_type,data)   	# 直接回傳識別結果
    rsp             = api.Predict(pred_type,data)				# 回傳詳細的識別結果
    '''

    just_flag    = False
    if just_flag :
        if rsp.ret_code == 0:
            #識別的結果如果與預期不符,可以呼叫這個介面將預期不符的訂單退款
            # 退款僅在正常識別出結果后,無法通過網站驗證的情況,請勿非法或者濫用,否則可能進行封號處理
            api.Justice( rsp.request_id)

    #card_id         = "123"
    #card_key        = "123"
    #充值
    #api.Charge(card_id, card_key)
    LOG("print in testfunc")

if __name__ == "__main__":
    TestFunc()



3.爬取人人網當前用戶的個人詳情頁資料

import requests
from lxml import etree
from fateadm_api import FateadmApi


def TestFunc(imgPath,codyType):
    pd_id           = "xxxxxxx"     #用戶中心頁可以查詢到pd資訊
    pd_key          = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
    app_id          = "xxxxxxxx"     #開發者分成用的賬號,在開發者中心可以查詢到
    app_key         = "xxxxxxxxx"
    #識別型別,
    #具體型別可以查看官方網站的價格頁選擇具體的型別,不清楚型別的,可以咨詢客服
    pred_type       = codyType
    api             = FateadmApi(app_id, app_key, pd_id, pd_key)
    # 查詢余額
    balance 		= api.QueryBalcExtend()   # 直接返余額
    # api.QueryBalc()

    # 通過檔案形式識別:
    file_name       = imgPath
    # 多網站型別時,需要增加src_url引數,具體請參考api檔案: http://docs.fateadm.com/web/#/1?page_id=6
    result =  api.PredictFromFileExtend(pred_type,file_name)   # 直接回傳識別結果
    #rsp             = api.PredictFromFile(pred_type, file_name)  # 回傳詳細識別結果

    '''
    # 如果不是通過檔案識別,則呼叫Predict介面:
    # result 			= api.PredictExtend(pred_type,data)   	# 直接回傳識別結果
    rsp             = api.Predict(pred_type,data)				# 回傳詳細的識別結果
    '''

    # just_flag    = False
    # if just_flag :
    #     if rsp.ret_code == 0:
    #         #識別的結果如果與預期不符,可以呼叫這個介面將預期不符的訂單退款
    #         # 退款僅在正常識別出結果后,無法通過網站驗證的情況,請勿非法或者濫用,否則可能進行封號處理
    #         api.Justice( rsp.request_id)

    #card_id         = "123"
    #card_key        = "123"
    #充值
    #api.Charge(card_id, card_key)
    #LOG("print in testfunc")
    return result

# if __name__ == "__main__":
#     TestFunc()

session = requests.Session()

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36'
}
url = 'http://www.renren.com/'
page_text = requests.get(url,headers = headers).text

tree = etree.HTML(page_text)
code_img_src = tree.xpath('//*[@id="verifyPic_login"]/@src')[0]

code_img_data = requests.get(code_img_src,headers = headers).content

with open('./code.jpg','wb') as fp:
    fp.write(code_img_data)

result = TestFunc('code.jpg',30600)
print('識別結果為:' + result)

login_url = 'http://www.renren.com/ajaxLogin/login?1=1&uniqueTimestamp=2021121720536'
data = {
    'email':'15893301681',
    'icode': result,
    'origURL': 'http://www.renren.com/home',
    'domain': 'renren.com',
    'key_id': '1',
    'captcha_type':' web_login',
    'password': '47e27dd5ef32b31041ebf56ec85a9b1e4233875e36396241c88245b188c56cdb',
    'rkey': 'c655ef0c57a72755f1240d6c0efac67d',
    'f': '',
}

response = session.post(login_url,headers = headers, data = data)
print(response.status_code)
with open('renren.html','w',encoding= 'utf-8') as fp:
    fp.write(response.text)

# headers = {
#     'Cookies'
# }
detail_url = 'http://www.renren.com/975996803/profile'
detail_page_text = session.get(detail_url,headers = headers).text

with open('bobo.html','w',encoding= 'utf-8') as fp:
    fp.write(detail_page_text)

第六章:高性能異步爬蟲(執行緒池,協程)

1.aiohttp實作多任務異步爬蟲

import requests
import asyncio
import time
import aiohttp

start = time.time()
urls = [
    'http://127.0.0.1:5000/bobo','http://127.0.0.1:5000/jay','http://127.0.0.1:5000/tom'
]

async def get_page(url):
    #print('正在下載',url)
    #response = requests.get(url)
    #print('下載完畢',response.text)
    async with aiohttp.ClientSession() as session:
        async with await session.get(url) as response:
            page_text = await response.text()
            print(page_text)

tasks = []

for url in urls:
    c = get_page(url)
    task = asyncio.ensure_future(c)
    tasks.append(task)

loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))

end = time.time()

print('總耗時',end - start)

2.flask服務

from flask import Flask
import time

app = Flask(__name__)

@app.route('/bobo')
def index_bobo():
    time.sleep(2)
    return 'Hello bobo'

@app.route('/jay')
def index_jay():
    time.sleep(2)
    return 'Hello jay'

@app.route('/tom')
def index_tom():
    time.sleep(2)
    return 'Hello tom'

if __name__ == '__main__':
    app.run(threaded = True)

3.多任務協程

import asyncio
import time

async def request(url):
    print('正在下載',url)
    #time.sleep(2)
    await asyncio.sleep(2)

    print('下載完成',url)

start = time.time()
urls = ['www.baidu.com',
        'www.sogou.com',
        'www,goubanjia.com'
]
        
stasks = []
for url in urls:
    c = request(url)
    task = asyncio.ensure_future(c)
    stasks.append(task)

loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(stasks))

print(time.time()-start)


4.多任務異步爬蟲

import requests
import asyncio
import time
#import aiohttp

start = time.time()
urls = [
    'http://127.0.0.1:5000/bobo','http://127.0.0.1:5000/jay','http://127.0.0.1:5000/tom'
]

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36'
}

async def get_page(url):
    print('正在下載',url)
    response = requests.get(url,headers =headers)
    print('下載完畢',response.text)

tasks = []

for url in urls:
    c = get_page(url)
    task = asyncio.ensure_future(c)
    tasks.append(task)

loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))

end = time.time()

print('總耗時',end - start)

5.示例

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36'
}

url = 'https://www.pearvideo.com/videoStatus.jsp?contId=1719770&mrd=0.559512982919081'

response = requests.get(url,headers = headers)
print(response.text)
"https://video.pearvideo.com/mp4/short/20210209/1613307944808-15603370-hd.mp4

6.同步爬蟲

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36'
}

urls = [
    'https://www.cnblogs.com/shaozheng/p/12795953.html',
    'https://www.cnblogs.com/hanfe1/p/12661505.html',
    'https://www.cnblogs.com/tiger666/articles/11070427.html']

def get_content(url):

    print('正在爬取:',url)
    response = requests.get(url,headers = headers)
    if response.status_code == 200:
        return response.content

def parse_content(content):
    print('回應資料的長度為:',len(content))

for url in urls:
    content = get_content(url)
    parse_content(content)

7.執行緒池基本使用

# import time

# def get_page(str):
#     print('正在下載:',str)
#     time.sleep(2)
#     print('下載成功:',str)

# name_list = ['xiaozi','aa','bb','cc']

# start_time = time.time()

# for i in range(len(name_list)):
#     get_page(name_list[i])

# end_time = time.time()

# print('%d second'%(end_time-start_time))

import time
from multiprocessing.dummy import Pool

start_time = time.time()
def get_page(str):
    print('正在下載:',str)
    time.sleep(2)
    print('下載成功:',str)

name_list = ['xiaozi','aa','bb','cc']
pool = Pool(4)
pool.map(get_page,name_list)

end_time = time.time()

print(end_time-start_time)

8.執行緒池在爬蟲案例中的應用

import requests
from lxml import etree
import re
from multiprocessing.dummy import Pool

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36'
}

url = 'https://www.pearvideo.com/'

page_text = requests.get(url,headers = headers).text

tree = etree.HTML(page_text)
li_list = tree.xpath('//div[@class="vervideo-tlist-bd recommend-btbg clearfix"]/ul/li')
#li_list = tree.xpath('//ul[@class="vervideo-tlist-small"]/li')
urls = []
for li in li_list:
    detail_url = 'https://www.pearvideo.com/' + li.xpath('./div/a/@href')[0]
    #name = li.xpath('./div/a/div[2]/text()')[0] + '.mp4'
    name = li.xpath('./div/a/div[2]/div[2]/text()')[0] + '.mp4'
    #print(detail_url,name)
    detail_page_text = requests.get(detail_url,headers = headers).text
    # ex = 'srcUrl="(.*?)",vdoUrl'
    # video_url = re.findall(ex,detail_page_text)[0]
    #video_url = tree.xpath('//img[@class="img"]/@src')[0]
    #https://video.pearvideo.com/mp4/short/20210209/{}-15603370-hd.mp4
    #xhrm碼
    print(detail_page_text)






    '''
    dic = {
        'name':name,
        'url':video_url
    }
    urls.append(dic)

    def get_video_data(dic):
        url = dic['url']
        print(dic['name'],'正在下載......')
        data = requests.get(url,headers = headers).context
        with open(dic['name','w']) as fp:
            fp.write(data)
            print(dic['name'],'下載成功!')
pool = Pool(4)
pool.map(get_video_data,urls)

pool.close()
pool.join()
'''



9.協程

import asyncio

async def request(url):
    print('正在請求的url是',url)
    print('請求成功,',url)
    return url

c = request('www.baidu.com')

# loop = asyncio.get_event_loop()
# loop.run_until_complete(c)



# loop = asyncio.get_event_loop()

# task = loop.create_task(c)
# print(task)

# loop.run_until_complete(task)
# print(task)



# loop = asyncio.get_event_loop()
# task = asyncio.ensure_future(c)
# print(task)
# loop.run_until_complete(task)
# print(task)


def callback_func(task):
    print(task.result())

loop = asyncio.get_event_loop()
task = asyncio.ensure_future(c)
task.add_done_callback(callback_func)
loop.run_until_complete(task)

第七章:動態加載資料處理(selenium模塊應用,模擬登錄12306)

在這里插入圖片描述

1.selenium基礎用法

from selenium import webdriver
from lxml import etree
from time import sleep

bro = webdriver.Chrome(executable_path='chromedriver.exe')

bro.get('http://scxk.nmpa.gov.cn:81/xk/')

page_text = bro.page_source

tree = etree.HTML(page_text)
li_list = tree.xpath('//ul[@id="gzlist"]/li')

for li in li_list:
    name = li.xpath('./dl/@title')[0]
    print(name)

sleep(5)
bro.quit()

2.selenium其他自動操作

from selenium import webdriver
from lxml import etree
from time import sleep

bro = webdriver.Chrome()

bro.get('https://www.taobao.com/')
sleep(2)

search_input = bro.find_element_by_xpath('//*[@id="q"]')
search_input.send_keys('Iphone')
sleep(2)
# bro.execute_async_script('window.scrollTo(0,document.body.scrollHeight)')
# sleep(5)

btn = bro.find_element_by_xpath('//*[@id="J_TSearchForm"]/div[1]/button')
print(type(btn))
btn.click()

bro.get('https://www.baidu.com')
sleep(2)
bro.back()
sleep(2)
bro.forward()

sleep(5)

bro.quit()

3.12306登錄示例代碼

# 大二
# 2021年2月18日
# 寒假開學時間3月7日

from selenium import webdriver
import time
from PIL import Image
from selenium.webdriver.chrome.options import Options
from selenium.webdriver import ChromeOptions
from selenium.webdriver import ActionChains


# chrome_options = Options()
# chrome_options.add_argument('--headless')
# chrome_options.add_argument('--disable-gpu')
bro = webdriver.Chrome()

bro.maximize_window()
time.sleep(5)
# option = ChromeOptions()
# option.add_experimental_option('excludeSwitches', ['enable-automation'])

# bro = webdriver.Chrome(chrome_options=chrome_options)

# chrome_options.add_argument("--window-size=1920,1050")
# bro = webdriver.Chrome(chrome_options=chrome_options,options= option)
bro.get('https://kyfw.12306.cn/otn/resources/login.html')

time.sleep(3)

bro.find_element_by_xpath('/html/body/div[2]/div[2]/ul/li[2]/a').click()

bro.save_screenshot('aa.png')
time.sleep(2)

code_img_ele = bro.find_element_by_xpath('//*[@id="J-loginImg"]')
time.sleep(2)
location = code_img_ele.location
print('location:',location)
size = code_img_ele.size
print('size',size)

rangle = (
int(location['x']),int(location['y']),int(location['x'] + int(size['width'])),int(location['y']+int(size['height']))
)
print(rangle)

i = Image.open('./aa.png')
code_img_name = './code.png'

frame = i.crop(rangle)
frame.save(code_img_name)

#bro.quit()


# 大二
# 2021年2月19日
# 寒假開學時間3月7日
#驗證碼坐標無法準確識別,坐標錯位,使用無頭瀏覽器可以識別
'''
result = print(chaojiying.PostPic(im, 9004)['pic_str'])
all_list = []
if '|' in result:
    list_1 = result.split('!')
    count_1 = len(list_1)
    for i in range(count_1):
        xy_list = []
        x = int(list_1[i].split(',')[0])
        y = int(list_1[i].split(',')[1])
        xy_list.append(x)
        xy_list.append(y)
        all_list.append(xy_list)

else:
    xy_list = []
    x = int(list_1[i].split(',')[0])
    y = int(list_1[i].split(',')[1])
    xy_list.append(x)
    xy_list.append(y)
    all_list.append(xy_list)

print(all_list)

for l in all_list:
    x = l[0]
    y = l[1]
    ActionChains(bro).move_to_element_with_offset(code_img_ele,x,y).click().perform()

    time.sleep(0.5)
    
bro.find_element_by_id('J-userName').send_keys('')
time.sleep(2)
bro.find_element_by_id('J-password').send_keys('')
time.sleep(2)
bro.find_element_by_id('J-login').click()
bro.quit()

'''

4.動作鏈與iframe的處理

from selenium import webdriver
from time import sleep
from selenium.webdriver import ActionChains

bro = webdriver.Chrome()

bro.get('https://www.runoob.com/try/try.php?filename=juquryui-api-droppable')

bro.switch_to.frame('id')
div = bro.find_elements_by_id('')

action = ActionChains(bro)

action.click_and_hold(div)
for i in range(5):
    action.move_by_offset(17,0)
    sleep(0.3)

action.release()
print(div)



5.谷歌無頭瀏覽器+反檢測

from selenium import webdriver
from time import sleep
from selenium.webdriver.chrome.options import Options
from selenium.webdriver import ChromeOptions

chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')

option = ChromeOptions()
option.add_experimental_option('excludeSwitches', ['enable-automation'])

bro = webdriver.Chrome(chrome_options=chrome_options,options=option)

bro.get('https://www.baidu.com')
print(bro.page_source)
sleep(2)
bro.quit()

6.基于selenium實作1236模擬登錄

#2021年2.18

import requests
from hashlib import md5

class Chaojiying_Client(object):

    def __init__(self, username, password, soft_id):
        self.username = username
        password =  password.encode('utf8')
        self.password = md5(password).hexdigest()
        self.soft_id = soft_id
        self.base_params = {
            'user': self.username,
            'pass2': self.password,
            'softid': self.soft_id,
        }
        self.headers = {
            'Connection': 'Keep-Alive',
            'User-Agent': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)',
        }

    def PostPic(self, im, codetype):
        """
        im: 圖片位元組
        codetype: 題目型別 參考 http://www.chaojiying.com/price.html
        """
        params = {
            'codetype': codetype,
        }
        params.update(self.base_params)
        files = {'userfile': ('ccc.jpg', im)}
        r = requests.post('http://upload.chaojiying.net/Upload/Processing.php', data=params, files=files, headers=self.headers)
        return r.json()

    def ReportError(self, im_id):
        """
        im_id:報錯題目的圖片ID
        """
        params = {
            'id': im_id,
        }
        params.update(self.base_params)
        r = requests.post('http://upload.chaojiying.net/Upload/ReportError.php', data=params, headers=self.headers)
        return r.json()


# if __name__ == '__main__':
# 	chaojiying = Chaojiying_Client('超級鷹用戶名', '超級鷹用戶名的密碼', '96001')	
# 	im = open('a.jpg', 'rb').read()													
# 	print chaojiying.PostPic(im, 1902)												

# chaojiying = Chaojiying_Client('xxxxxxxxxx', 'xxxxxxxxxx', 'xxxxxxx')	
# im = open('第七章:動態加載資料處理/12306.jpg', 'rb').read()													
# print(chaojiying.PostPic(im, 9004)['pic_str'])

from selenium import webdriver
import time

bro = webdriver.Chrome()
bro.get('https://kyfw.12306.cn/otn/resources/login.html')

time.sleep(3)

bro.find_element_by_xpath('/html/body/div[2]/div[2]/ul/li[2]/a').click()




7.模擬登錄qq空間

from selenium import webdriver
from selenium.webdriver import ActionChains
from time import sleep

bro = webdriver.Chrome()
bro.get('https://qzone.qq.com/')

bro.switch_to.frame('login_frame')

bro.find_element_by_id('switcher_plogin').click()

#account = input('請輸入賬號:')
bro.find_element_by_id('u').send_keys('')
#password = input('請輸入密碼:')

bro.find_element_by_id('p').send_keys('')
bro.find_element_by_id('login_button').click()


第八章:scrapy框架

1.各種專案實戰,scrapy各種配置修改
2.bossPro示例

# 大二
# 2021年2月23日星期二
# 寒假開學時間3月7日
import requests
from lxml import etree

#url = 'https://www.zhipin.com/c101010100/?query=python&ka=sel-city-101010100'
url = 'https://www.zhipin.com/c101120100/b_%E9%95%BF%E6%B8%85%E5%8C%BA/?ka=sel-business-5'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
}

page_text = requests.get(url,headers = headers).text

tree = etree.HTML(page_text)
print(tree)
li_list = tree.xpath('//*[@id="main"]/div/div[2]/ul/li')
print(li_list)
for li in li_list:
    job_name = li.xpath('.//span[@class="job-name"]a/text()')
    print(job_name)

3.qiubaiPro示例

# -*- coding: utf-8 -*-
# 大二
# 2021年2月21日星期日
# 寒假開學時間3月7日

import requests
from lxml import etree

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
}

url = 'https://www.qiushibaike.com/text/'
page_text = requests.get(url,headers = headers).text

tree = etree.HTML(page_text)
div_list = tree.xpath('//div[@id="content"]/div[1]/div[2]/div')

print(div_list)
# print(tree.xpath('//*[@id="qiushi_tag_124072337"]/a[1]/div/span//text()'))

for div in div_list:
    auther = div.xpath('./div[1]/a[2]/h2/text()')[0]
    # print(auther)
    content = div.xpath('./a[1]/div/span//text()')
    content = ''.join(content)
    # content = div.xpath('//*[@id="qiushi_tag_124072337"]/a[1]/div/span')
    # print(content)
    print(auther,content)

# print(tree.xpath('//*[@id="qiushi_tag_124072337"]/div[1]/a[2]/h2/text()'))

4.資料庫示例

# 大二
# 2021年2月21日星期日
# 寒假開學時間3月7日

import pymysql

# 鏈接資料庫
# 引數1:mysql服務器所在主機ip
# 引數2:用戶名
# 引數3:密碼
# 引數4:要鏈接的資料庫名
# db = pymysql.connect("localhost", "root", "200829", "wj" )
db = pymysql.connect("192.168.31.19", "root", "200829", "wj" )

# 創建一個cursor物件
cursor = db.cursor()

sql = "select version()"

# 執行sql陳述句
cursor.execute(sql)

# 獲取回傳的資訊
data = cursor.fetchone()
print(data)

# 斷開
cursor.close()
db.close()

在這里插入圖片描述
在這上面scrapy專案不容易上傳
有需要scrapy相關的,可以在我的資源上下載
也可以在公眾號(yk 坤帝,跟博客昵稱一樣)獲取
公眾號獲取的速度可能有點慢,才申請的,還在探索程序
yk坤帝
有問題的,想交流的也可以在公眾號上留言

轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/282705.html

標籤:python

上一篇:碼農飛升記-00-Java發展歷程

下一篇:【Python小游戲】用AI玩Python小游戲FlappyBird【原始碼】

標籤雲
其他(157675) Python(38076) JavaScript(25376) Java(17977) C(15215) 區塊鏈(8255) C#(7972) AI(7469) 爪哇(7425) MySQL(7132) html(6777) 基礎類(6313) sql(6102) 熊猫(6058) PHP(5869) 数组(5741) R(5409) Linux(5327) 反应(5209) 腳本語言(PerlPython)(5129) 非技術區(4971) Android(4554) 数据框(4311) css(4259) 节点.js(4032) C語言(3288) json(3245) 列表(3129) 扑(3119) C++語言(3117) 安卓(2998) 打字稿(2995) VBA(2789) Java相關(2746) 疑難問題(2699) 细绳(2522) 單片機工控(2479) iOS(2429) ASP.NET(2402) MongoDB(2323) 麻木的(2285) 正则表达式(2254) 字典(2211) 循环(2198) 迅速(2185) 擅长(2169) 镖(2155) 功能(1967) .NET技术(1958) Web開發(1951) python-3.x(1918) HtmlCss(1915) 弹簧靴(1913) C++(1909) xml(1889) PostgreSQL(1872) .NETCore(1853) 谷歌表格(1846) Unity3D(1843) for循环(1842)

熱門瀏覽
  • 【C++】Microsoft C++、C 和匯編程式檔案

    ......

    uj5u.com 2020-09-10 00:57:23 more
  • 例外宣告

    相比于斷言適用于排除邏輯上不可能存在的狀態,例外通常是用于邏輯上可能發生的錯誤。 例外宣告 Item 1:當函式不可能拋出例外或不能接受拋出例外時,使用noexcept 理由 如果不打算拋出例外的話,程式就會認為無法處理這種錯誤,并且應當盡早終止,如此可以有效地阻止例外的傳播與擴散。 示例 //不可 ......

    uj5u.com 2020-09-10 00:57:27 more
  • Codeforces 1400E Clear the Multiset(貪心 + 分治)

    鏈接:https://codeforces.com/problemset/problem/1400/E 來源:Codeforces 思路:給你一個陣列,現在你可以進行兩種操作,操作1:將一段沒有 0 的區間進行減一的操作,操作2:將 i 位置上的元素歸零。最終問:將這個陣列的全部元素歸零后操作的最少 ......

    uj5u.com 2020-09-10 00:57:30 more
  • UVA11610 【Reverse Prime】

    本人看到此題沒有翻譯,就附帶了一個自己的翻譯版本 思考 這一題,它的第一個要求是找出所有 $7$ 位反向質數及其質因數的個數。 我們應該需要質數篩篩選1~$10^{7}$的所有數,這里就不慢慢介紹了。但是,重讀題,我們突然發現反向質數都是 $7$ 位,而將它反過來后的數字卻是 $6$ 位數,這就說明 ......

    uj5u.com 2020-09-10 00:57:36 more
  • 統計區間素數數量

    1 #pragma GCC optimize(2) 2 #include <bits/stdc++.h> 3 using namespace std; 4 bool isprime[1000000010]; 5 vector<int> prime; 6 inline int getlist(int ......

    uj5u.com 2020-09-10 00:57:47 more
  • C/C++編程筆記:C++中的 const 變數詳解,教你正確認識const用法

    1、C中的const 1、區域const變數存放在堆疊區中,會分配記憶體(也就是說可以通過地址間接修改變數的值)。測驗代碼如下: 運行結果: 2、全域const變數存放在只讀資料段(不能通過地址修改,會發生寫入錯誤), 默認為外部聯編,可以給其他源檔案使用(需要用extern關鍵字修飾) 運行結果: ......

    uj5u.com 2020-09-10 00:58:04 more
  • 【C++犯錯記錄】VS2019 MFC添加資源不懂如何修改資源宏ID

    1. 首先在資源視圖中,添加資源 2. 點擊新添加的資源,復制自動生成的ID 3. 在解決方案資源管理器中找到Resource.h檔案,編輯,使用整個專案搜索和替換的方式快速替換 宏宣告 4. Ctrl+Shift+F 全域搜索,點擊查找全部,然后逐個替換 5. 為什么使用搜索替換而不使用屬性視窗直 ......

    uj5u.com 2020-09-10 00:59:11 more
  • 【C++犯錯記錄】VS2019 MFC不懂的批量添加資源

    1. 打開資源頭檔案Resource.h,在其中預先定義好宏 ID(不清楚其實ID值應該設定多少,可以先新建一個相同的資源項,再在這個資源的ID值的基礎上遞增即可) 2. 在資源視圖中選中專案資源,按F7編輯資源檔案,按 ID 型別 相對路徑的形式添加 資源。(別忘了先把檔案拷貝到專案中的res檔案 ......

    uj5u.com 2020-09-10 01:00:19 more
  • C/C++編程筆記:關于C++的參考型別,專供新手入門使用

    今天要講的是C++中我最喜歡的一個用法——參考,也叫別名。 參考就是給一個變數名取一個變數名,方便我們間接地使用這個變數。我們可以給一個變數創建N個參考,這N + 1個變數共享了同一塊記憶體區域。(參考型別的變數會占用記憶體空間,占用的記憶體空間的大小和指標型別的大小是相同的。雖然參考是一個物件的別名,但 ......

    uj5u.com 2020-09-10 01:00:22 more
  • 【C/C++編程筆記】從頭開始學習C ++:初學者完整指南

    眾所周知,C ++的學習曲線陡峭,但是花時間學習這種語言將為您的職業帶來奇跡,并使您與其他開發人員區分開。您會更輕松地學習新語言,形成真正的解決問題的技能,并在編程的基礎上打下堅實的基礎。 C ++將幫助您養成良好的編程習慣(即清晰一致的編碼風格,在撰寫代碼時注釋代碼,并限制類內部的可見性),并且由 ......

    uj5u.com 2020-09-10 01:00:41 more
最新发布
  • Rust中的智能指標:Box<T> Rc<T> Arc<T> Cell<T> RefCell<T> Weak

    Rust中的智能指標是什么 智能指標(smart pointers)是一類資料結構,是擁有資料所有權和額外功能的指標。是指標的進一步發展 指標(pointer)是一個包含記憶體地址的變數的通用概念。這個地址參考,或 ” 指向”(points at)一些其 他資料 。參考以 & 符號為標志并借用了他們所 ......

    uj5u.com 2023-04-20 07:24:10 more
  • Java的值傳遞和參考傳遞

    值傳遞不會改變本身,參考傳遞(如果傳遞的值需要實體化到堆里)如果發生修改了會改變本身。 1.基本資料型別都是值傳遞 package com.example.basic; public class Test { public static void main(String[] args) { int ......

    uj5u.com 2023-04-20 07:24:04 more
  • [2]SpinalHDL教程——Scala簡單入門

    第一個 Scala 程式 shell里面輸入 $ scala scala> 1 + 1 res0: Int = 2 scala> println("Hello World!") Hello World! 檔案形式 object HelloWorld { /* 這是我的第一個 Scala 程式 * 以 ......

    uj5u.com 2023-04-20 07:23:58 more
  • 理解函式指標和回呼函式

    理解 函式指標 指向函式的指標。比如: 理解函式指標的偽代碼 void (*p)(int type, char *data); // 定義一個函式指標p void func(int type, char *data); // 宣告一個函式func p = func; // 將指標p指向函式func ......

    uj5u.com 2023-04-20 07:23:52 more
  • Django筆記二十五之資料庫函式之日期函式

    本文首發于公眾號:Hunter后端 原文鏈接:Django筆記二十五之資料庫函式之日期函式 日期函式主要介紹兩個大類,Extract() 和 Trunc() Extract() 函式作用是提取日期,比如我們可以提取一個日期欄位的年份,月份,日等資料 Trunc() 的作用則是截取,比如 2022-0 ......

    uj5u.com 2023-04-20 07:23:45 more
  • 一天吃透JVM面試八股文

    什么是JVM? JVM,全稱Java Virtual Machine(Java虛擬機),是通過在實際的計算機上仿真模擬各種計算機功能來實作的。由一套位元組碼指令集、一組暫存器、一個堆疊、一個垃圾回收堆和一個存盤方法域等組成。JVM屏蔽了與作業系統平臺相關的資訊,使得Java程式只需要生成在Java虛擬機 ......

    uj5u.com 2023-04-20 07:23:31 more
  • 使用Java接入小程式訂閱訊息!

    更新完微信服務號的模板訊息之后,我又趕緊把微信小程式的訂閱訊息給實作了!之前我一直以為微信小程式也是要企業才能申請,沒想到小程式個人就能申請。 訊息推送平臺🔥推送下發【郵件】【短信】【微信服務號】【微信小程式】【企業微信】【釘釘】等訊息型別。 https://gitee.com/zhongfuch ......

    uj5u.com 2023-04-20 07:22:59 more
  • java -- 緩沖流、轉換流、序列化流

    緩沖流 緩沖流, 也叫高效流, 按照資料型別分類: 位元組緩沖流:BufferedInputStream,BufferedOutputStream 字符緩沖流:BufferedReader,BufferedWriter 緩沖流的基本原理,是在創建流物件時,會創建一個內置的默認大小的緩沖區陣列,通過緩沖 ......

    uj5u.com 2023-04-20 07:22:49 more
  • Java-SpringBoot-Range請求頭設定實作視頻分段傳輸

    老實說,人太懶了,現在基本都不喜歡寫筆記了,但是網上有關Range請求頭的文章都太水了 下面是抄的一段StackOverflow的代碼...自己大修改過的,寫的注釋挺全的,應該直接看得懂,就不解釋了 寫的不好...只是希望能給視頻網站開發的新手一點點幫助吧. 業務場景:視頻分段傳輸、視頻多段傳輸(理 ......

    uj5u.com 2023-04-20 07:22:42 more
  • Windows 10開發教程_編程入門自學教程_菜鳥教程-免費教程分享

    教程簡介 Windows 10開發入門教程 - 從簡單的步驟了解Windows 10開發,從基本到高級概念,包括簡介,UWP,第一個應用程式,商店,XAML控制元件,資料系結,XAML性能,自適應設計,自適應UI,自適應代碼,檔案管理,SQLite資料庫,應用程式到應用程式通信,應用程式本地化,應用程式 ......

    uj5u.com 2023-04-20 07:22:35 more