現在年輕人聊天，不帶點表情包都不好意思說自己是年輕人，表情包已然成為人與人聊天中不可缺少的部分，

剛認識的朋友丟幾個表情包出去分分鐘拉進關系，女朋友生悶氣了整兩個表情包開心一下，也可以化解尷尬，沒時間打字整兩張表情包，禮貌而不失尷尬，

一、欲揚先抑

準備作業很重要，先知道我們要干啥，用什么來做，怎么做，再去一步步實時，穩扎穩打，

開發環境配置

Python 3.6
Pycharm

打開你的瀏覽器搜索你要安裝的軟體名字

Python

后面帶官方的就是官網了，但凡名字下方帶了廣告二字就別點，自信點，那就是廣告，

直接點下面的 Python 3.10.2 下載最新版本即可，不用點那啥 Download
在這里插入圖片描述

pycharm

隨便點一個 Download
在這里插入圖片描述專業版社區版都OK

# 安裝方法一個個寫太久了，可以加下群
# Python學習交流1群：924040232
# Python學習交流2群：815624229
# 我還給大家準備了大量的Python學習資料，直接在群里就可以免費領取了，

模塊安裝配置

requests
parsel
re

打開電腦，按住win+r，輸入cmd，回車，輸入pip install （加上要安裝的模塊名），回車即可安裝，

二、代碼

目標：fabiaoqing
地址前面后面大家自己補全一下，包括后面代碼里的，這應該沒有不會的吧，

匯入模塊

import requests 
import parsel 
import re
import time

請求網址

url = f'fabiaoqing/biaoqing/lists/page/{page}.html'

請求頭

headers = {
       'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'
    }

回傳網頁源代碼

response = requests.get(url=url, headers=headers)

決議資料

selector = parsel.Selector(response.text) # 把respons.text 轉換成 selector 物件

第一次提取提取所有的div標簽內容

divs = selector.css('#container div.tagbqppdiv') # css 根據標簽提取內容

通過標簽內容提取他的圖片url地址

img_url = div.css('img::attr(data-original)').get()

提取標題

title = div.css('img::attr(title)').get()

獲取圖片的后綴名

name = img_url.split('.')[-1]

保存資料

new_title = change_title(title)

對表情包圖片發送請求獲取它二進制資料

img_content = requests.get(url=img_url, headers=headers).content

保存資料

def save(title, img_url, name):

    img_content = get_response(img_url).content
    try:
        with open('img\\' + title + '.' + name, mode='wb') as f:
            # 寫入圖片二進制資料
            f.write(img_content)
            print('正在保存:', title)
    except:
        pass

替換標題中的特殊字符

因為檔案命名不明還有特殊字符，所以我們需要通過正則運算式替換掉特殊字符，

def change_title(title):
    mode = re.compile(r'[\\\/\:\*\?\"\<\>\|]')
    new_title = re.sub(mode, "_", title)
    return new_title

記錄時間

time_2 = time.time()

use_time = int(time_2) - int(time_1)
print(f'總共耗時:{use_time}秒')

兄弟們，這里是單執行緒，下面是多執行緒，我就直接上代碼了，

import requests  
import parsel 
import re
import time
import concurrent.futures 



def change_title(title):

    mode = re.compile(r'[\\\/\:\*\?\"\<\>\|]')
    new_title = re.sub(mode, "_", title)
    return new_title


def get_response(html_url):

    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'
    }
    repsonse = requests.get(url=html_url, headers=headers)
    return repsonse


def save(title, img_url, name):

    img_content = get_response(img_url).content
    try:
        with open('img\\' + title + '.' + name, mode='wb') as f:
          
            f.write(img_content)
            print('正在保存:', title)
    except:
        pass


def main(html_url):

    html_data = get_response(html_url).text
    selector = parsel.Selector(html_data) 
    divs = selector.css('#container div.tagbqppdiv') 
    for div in divs:

        img_url = div.css('img::attr(data-original)').get()
 
        title = div.css('img::attr(title)').get()

        name = img_url.split('.')[-1]
 
        new_title = change_title(title)
        save(new_title, img_url, name)


if __name__ == '__main__':
    time_1 = time.time()
    exe = concurrent.futures.ThreadPoolExecutor(max_workers=10)
    for page in range(1, 201):
        url = f'fabiaoqing/biaoqing/lists/page/{page}.html'
        exe.submit(main, url)
    exe.shutdown()
    time_2 = time.time()
    use_time = int(time_2) - int(time_1)
    print(f'總共耗時:{use_time}秒')

兄弟們，18秒一千多張，這結束的有點快了啊

大家看完覺得有用的話，點個贊收藏一下唄，愛你摸摸大，

你看代碼運行這么快，只要18秒，我可不希望大家平常生活中也這么快，嘿嘿，不太好~

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/440485.html

標籤：Python

上一篇：Spring系列20：注解詳解和Spring注解增強(基礎內功)

下一篇：第五篇- 抖音的強大對手來了，用Flutter手擼一個抖音國際版，看看有多炫

用python一鍵爬取幾千張表情包斗圖，分分鐘征服朋友圈所有好友

一、欲揚先抑

二、代碼