python對王者榮耀英雄皮膚進行圖片采集~-有解無憂

前言

嗨嘍~大家好呀，這里是魔王吶

環境使用:

Python 3.8
Pycharm

模塊使用:

requests ---> 資料請求模塊需要安裝 pip install requests
re 正則運算式內置模塊不需要安裝
os 檔案操作模塊內置模塊不需要安裝 --> 自動創建檔案夾把每個英雄都自動創建對應檔案

基本套路

一. 資料來源分析

確定需求, 確定采集目標
通過開發者工具抓包分析, 分析我們想要資料內容來自于那個url地址

F12 或者滑鼠右鍵點擊檢查選擇 network(網路) 重繪網頁
去分析圖片url地址是什么 ---> 選擇 Img 可以查找圖片url地址

505 表示英雄ID

2 皮膚第幾個 ---> 通過皮膚名字對應他的皮膚鏈接

想要獲取 yao 皮膚資料

向網址發送請求
獲取response回應資料
提取皮膚名字
構建皮膚 url地址
保存資料

二. 代碼實作步驟

發送請求, 模擬瀏覽器對于url地址發送請求
獲取資料, 獲取服務器回傳回應資料
決議資料, 提取我們想要內容, 皮膚名字
保存資料, 資料保存本地

代碼

# 匯入資料請求模塊  ---> 第三方模塊 需要 在cmd里面進行安裝 pip install requests
import requests
# 匯入正則模塊  ---> 內置模塊 不需要安裝
import re
# 匯入檔案操作模塊  ---> 內置模塊 不需要安裝
import os

# 確定網址
link = 'https://pvp.qq.com/web201605/js/herolist.json'
# 模擬偽裝瀏覽器 ---> 請求頭
headers = {
    # user-agent 用戶代理 表示瀏覽器基本身份標識
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36'
}
# 發送請求
json_data = requests.get(url=link, headers=headers).json()
# for回圈遍歷
for index in json_data:
    # 字典鍵值對取值 根據冒號左邊的內容[鍵],提取冒號右邊的內容[值]
    hero_id = index['ename']
    hero_name = index['cname']
    # 設定檔案夾路徑 相對路徑
    file = f'img\\{hero_name}\\'
    if not os.path.exists(file):
        os.makedirs(file)
    """
    1. 發送請求, 模擬瀏覽器對于url地址發送請求
        - headers 字典資料型別, 構建完整鍵值對
        - 請求頭引數 可以直接在開發者工具復制粘貼
        - 使用什么請求方法, 根據開發者工具來
    """
    # 確定請求url地址
    url = f'https://pvp.qq.com/web201605/herodetail/{hero_id}.shtml'
    # 模擬偽裝瀏覽器 ---> 請求頭
    headers = {
        # user-agent 用戶代理 表示瀏覽器基本身份標識
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36'
    }
    # 發送請求 ---> <Response [200]> 回應物件: <>表示物件 response 回應回復 200 狀態碼 表示請求成功
    response = requests.get(url=url, headers=headers)
    # 亂碼了 怎么辦? ---> 你要根據網頁編碼來 response.encoding = 'gbk'
    # 自動識別編碼
    response.encoding = response.apparent_encoding
    # 獲取資料, 獲取服務器回傳回應資料 文本資料 print(response.text)
    """
    決議資料 re正則  會1 不會2
        re.findall()  從什么地方 去找什么資料
        從 response.text 里面 去找 data-imgname="(.*?)"> 其中 (.*?) 就是我們要的資料
    """
    title_list = re.findall('data-imgname="(.*?)">', response.text)[0]
    # 鹿靈守心&0|森&0|遇見神鹿&71|時之祈愿&94|時之愿境&42
    title_list = re.sub('&\d+', '', title_list).split('|')
    print(title_list)
    # for回圈 for num in range(1, 6): len() 統計串列元素個數
    for num in range(1, len(title_list) +1):
        # 串列取值, 根據索引位置,索引位置從0開始計數
        img_name = title_list[num-1]
        # 構建圖片url地址
        img_url = f'https://game.gtimg.cn/images/yxzj/img201606/skin/hero-info/{hero_id}/{hero_id}-bigskin-{num}.jpg'
        print(img_name, img_url)
        # 保存資料 ---> 發送請求 獲取資料 二進制資料
        img_content = requests.get(url=img_url, headers=headers).content
        with open(file + img_name + '.jpg', mode='wb') as f:
            f.write(img_content)

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/500372.html

標籤：其他

上一篇：Java創建一個JDBC工具類并解決回傳ResultSet的問題

下一篇：day07-