請求庫之requests庫-有解無憂

一、介紹
二、基于get請求
三、基于post請求
四、回應Response
五、高級用法

一、介紹

#介紹：使用requests可以模擬瀏覽器的請求，比起之前用到的urllib，requests模塊的api更加便捷（本質就是封裝了urllib3）

#注意：requests庫發送請求將網頁內容下載下來以后，并不會執行js代碼，這需要我們自己分析目標站點然后發起新的request請求

#安裝：pip3 install requests

#各種請求方式：常用的就是requests.get()和requests.post()
>>> import requests
>>> r = requests.get('https://api.github.com/events')
>>> r = requests.post('http://httpbin.org/post', data = https://www.cnblogs.com/cqzlei/p/{'key':'value'})
>>> r = requests.put('http://httpbin.org/put', data = https://www.cnblogs.com/cqzlei/p/{'key':'value'})
>>> r = requests.delete('http://httpbin.org/delete')
>>> r = requests.head('http://httpbin.org/get')
>>> r = requests.options('http://httpbin.org/get')

二、基于get請求

1、基本請求

response是python的物件，包含回應頭，回應體......

header = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36',
    'referer': 'https://www.mzitu.com/225078/2'
 }

response = requests.get('https://www.mzitu.com/', headers=header)
print(response.text)  # 回應的文本內容-->決議出圖片地址

result = requests.get('https://i3.mmzztt.com/2020/03/14a02.jpg', headers=header)
print(result.content)  # 回應的二進制內容

# 下載并保存圖片
with open('a.jpg', 'wb')as f:
   for line in result.iter_content():
       f.write(line)

2、帶引數的get請求

header = {
     'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36',
    }

方式一：直接拼在url后邊
res=requests.get('https://www.baidu.com/s?wd=美女',headers=header)
# 如果查詢關鍵詞是中文或者有其他特殊符號，則不得不進行url編碼
# from urllib.parse import urlencode,unquote 
編碼
urlencode('美女',encoding='utf-8')  
解碼
unquote('%2Fs%3Fwd%3D%25E7%') 

方式二：用params, 可以自動url編碼
res=requests.get('http://www.baidu.com/s', headers=header, params={'wd':'美女'})

3、請求攜帶cookie

方式一，在header中放
header = {
     'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36',
     'cookie':'key=asdfasdfasdfsdfsaasdf; key2=asdfasdf; key3=asdfasdf'
     }
res=requests.get(url, headers=header)


方式二，當成引數直接傳,推薦
header = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36',
      }

# cookies是一個字典或者CookieJar物件,第一次訪問利用respone.cookies獲取CookieJar物件-->賦值給變數，訪問其他頁面時，傳入CookieJar物件
res=requests.get(url, headers=header, cookies={'key':'asdfasdf'})
print(res.text)

三、基于post請求

1、基本用法
# requests.post()用法與requests.get()完全一致，特殊的是requests.post()有一個data引數，用來存放請求體資料
# data引數攜帶資料（urlencoded和json）

res=requests.post(url, data=https://www.cnblogs.com/cqzlei/p/{'name':'lqz'})

res=requests.post(url, json={"age":"18"})

2、發送post請求，模擬瀏覽器的登錄行為

2.1 目標站點分析
    瀏覽器輸入https://github.com/login
    然后輸入錯誤的賬號密碼，抓包
    發現登錄行為是post提交到：https://github.com/session
    請求頭包含cookie
    請求體包含：
        commit:Sign in
        utf8:?
        authenticity_token:lbI8IJCwGslZS8qJPnof5e7ZkCoSoMn6jmDTsL1r/m06NLyIbw7vCrpwrFAPzHMep3Tmf/TSJVoXWrvDZaVwxQ==
        login:egonlin
        password:123


2.2流程分析
    先GET：https://github.com/login拿到初始cookie與authenticity_token
    回傳POST：https://github.com/session， 帶上初始cookie，帶上請求體（authenticity_token，用戶名，密碼等）
    最后拿到登錄cookie


ps：如果密碼時密文形式，則可以先輸錯賬號，輸對密碼，然后到瀏覽器中拿到加密后的密碼，github的密碼是明文
代碼如下

點擊查看代碼

模擬登錄，獲取cookie

import requests
import re

#第一次請求
r1=requests.get('https://github.com/login')
r1_cookie=r1.cookies.get_dict() #拿到初始cookie(未被授權)
authenticity_token=re.findall(r'name="authenticity_token".*?value="https://www.cnblogs.com/cqzlei/p/(.*?)"',r1.text)[0] #從頁面中拿到CSRF TOKEN

#第二次請求：帶著初始cookie和TOKEN發送POST請求給登錄頁面，帶上賬號密碼
data=https://www.cnblogs.com/cqzlei/p/{'commit':'Sign in',
    'utf8':'?',
    'authenticity_token':authenticity_token,
    'login':'[email protected]',
    'password':'alex3714'
}
r2=requests.post('https://github.com/session',
             data=https://www.cnblogs.com/cqzlei/p/data,
             cookies=r1_cookie
             )

login_cookie=r2.cookies.get_dict() # 拿到登錄后的cookie

#第三次請求：以后的登錄，拿著login_cookie就可以,比如訪問一些個人配置
r3=requests.get('https://github.com/settings/emails',
                cookies=login_cookie)

print('[email protected]' in r3.text) # 查詢郵箱，如果為True，說明cookie已登錄

3、自動攜帶cookie
session=requests.session()  # 生成request.session()物件
res1=session.post('http://127.0.0.1:8000/index/')  # 假設這個請求登錄了
res2=session.get('http://127.0.0.1:8000/order/')   # 現在不需要手動帶cookie，session自動處理

代碼如下

點擊查看代碼

自動攜帶cookie，簡化上述模擬登錄案例

import requests
import re

session=requests.session()
#第一次請求
r1=session.get('https://github.com/login')
authenticity_token=re.findall(r'name="authenticity_token".*?value="https://www.cnblogs.com/cqzlei/p/(.*?)"',r1.text)[0] #從頁面中拿到CSRF TOKEN

#第二次請求
data=https://www.cnblogs.com/cqzlei/p/{'commit':'Sign in',
    'utf8':'?',
    'authenticity_token':authenticity_token,
    'login':'[email protected]',
    'password':'alex3714'
}
r2=session.post('https://github.com/session',
             data=https://www.cnblogs.com/cqzlei/p/data,
             )

#第三次請求
r3=session.get('https://github.com/settings/emails')

print('[email protected]' in r3.text) #True

四、回應Response

1、response屬性

respone=requests.post(url, data=https://www.cnblogs.com/cqzlei/p/{'name':'lqz'})
print(respone.text)         # 回應的文本
print(respone.content)      # 回應體的二進制資料
print(respone.status_code)  # 回應狀態碼
print(respone.headers)      # 回應頭
print(respone.cookies)      # cookieJar物件，訪問首頁后網站設定了cookie，訪問其他頁面就需要帶這個cookie，用這個方法先把cookie拿出來
print(respone.cookies.get_dict()) #  把cookieJar物件轉成字典
print(respone.cookies.items())    #  cookie字典的key和value鍵值對，取出來后放在元祖里
print(respone.url)        # 請求的url
print(respone.history)    # 是一個串列，放重定向之前的地址
print(respone.encoding)   # 回應的編碼方式

respone.iter_content()    # 獲取二進制流：圖片，視頻，大檔案，一點一點回圈取出來
for line in respone.iter_content():
     f.write(line)

2、編碼問題

res=requests.get('http://www.autohome.com/news')
# 一旦列印出來出現亂碼問題
# 方式一：按照網站指定的編碼格式把回應物件轉碼
res.encoding='gb2312'

# 方式二：通用的轉碼方式
res.encoding=res.apparent_encoding
print(res.text)

3、決議json

import requests
response=requests.get('http://httpbin.org/get')

import json
res1=json.loads(response.text) #太麻煩

res2=response.json() #直接獲取json資料

五、高級用法

1、SSL Cert Verification(了解)

#證書驗證(大部分網站都是https)
import requests
respone=requests.get('https://www.12306.cn') #如果是ssl請求,首先檢查證書是否合法,不合法則報錯,程式終端


#改進1:去掉報錯,但是會報警告
import requests
respone=requests.get('https://www.12306.cn',verify=False) #不驗證證書,報警告,回傳200
print(respone.status_code)

#改進2:去掉報錯,并且去掉警報資訊
import requests
from requests.packages import urllib3
urllib3.disable_warnings() #關閉警告
respone=requests.get('https://www.12306.cn',verify=False)
print(respone.status_code)

#改進3:加上證書(本地路徑配證書)
#很多網站都是https,但是不用證書也可以訪問,大多數情況都是可以攜帶也可以不攜帶證書
#知乎\百度等都是可帶可不帶
#有硬性要求的,則必須帶，比如對于定向的用戶,拿到證書后才有權限訪問某個特定網站
import requests
respone=requests.get('https://www.12306.cn', cert=('/path/server.crt','/path/key'))
print(respone.status_code)

2、使用代理(重點)

proxies={
    'http':'http://egon:123@localhost:9743', #帶用戶名密碼的代理,@符號前是用戶名與密碼
    'http':'http://localhost:9743',          # 代理ip+埠號
    'https':'https://localhost:9743',
}
respone=requests.get('https://www.12306.cn',proxies=proxies)

# 代理池：串列放了一堆代理ip，每次隨機取一個，再發請求就不會封ip了
# 高匿和透明代理？如果使用高匿代理，后端無論如何拿不到你的ip，使用透明，后端能夠拿到你的ip
# 后端如何查到透明代理的ip？  后端META中：X-Forwarded-For這個欄位可以拿到

3、超時設定
#兩種超時:float or tuple
timeout=0.1 #代表接收資料的超時時間
timeout=(0.1,0.2) #0.1代表鏈接超時  0.2代表接收資料的超時時間

import requests
respone=requests.get('https://www.baidu.com',timeout=0.0001)

4、認證設定(了解)

# 老的網站登錄,彈出一個框,要求你輸入用戶名密碼（與alter很類似），此時是無法獲取html的
r=requests.get(url, auth=('user','password'))
print(r.status_code)

5、例外處理
#可以查看requests.exceptions獲取例外型別
from requests.exceptions import *

# 捕獲一個總例外就行了
try:
    res = requests.get('http://www.baidu.com',timeout=0.00001) 
except Exception as e:
    print(e)

6、上傳檔案
res=requests.post(url, files={'myfile':open('a.jpg','rb')})
print(res.text)

# 后端request.FILES.get('myfile') 獲取到上傳的檔案物件
# requests模塊可以用來與后端做互動，如短信介面和支付介面的sdk封裝就是用的requests模塊，如果沒有第三方的sdk包，基于api寫第三方互動就用requests模塊

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/509416.html

標籤：Python

上一篇：Python爬取全球疫情資料，制作資料可視化圖

下一篇：創建Django專案