Python爬蟲實戰-性感gif圖資料采集-有解無憂

前言

最近發現一個十分有趣的網站（狗頭保命），一些影視劇里讓人血脈膨脹的鏡頭制作成的gif圖片，滿滿的都是全是愛，作為一個合格的小爬蟲，不把它都放進‘作業’檔案里怎么行

爬取目標

網址：GIF出處

工具使用

開發工具：pycharm
開發環境：python3.7， Windows10
使用工具包：requests，lxml

重點內容學習

requests使用
xpath決議資料
獲取gif資料

專案思路決議

首先明確自己需要采集的目標資料網址
通過requests工具包發送網路請求
翻頁通過改變url

http://gifcc.com/forum-38-{}.html

在這里插入圖片描述

轉換當前頁面資料
通過xpath方式提取網頁資料
提取的資料為a標簽的值
我們需要的是動態圖
gif在詳情頁面

    url = 'http://gifcc.com/forum-38-{}.html'.format(page)
    response = RequestTools(url).text
    html = etree.HTML(response)
    atarget = html.xpath('//div[@class="c cl"]/a/@href')
    for i in atarget:
        urls = 'http://gifcc.com/' + i

再次對詳情頁面發送網路請求
進入詳情頁面，通過xpath方式提取出對應的標題以及對應gif圖片地址

在這里插入圖片描述

圖片的名字也可以自行定義

response = RequestTools(url).text
    html = etree.HTML(response) # HTML物件創建 替換了命名空間
    try:
        gifurl = html.xpath('//td[@class="t_f"]/div[1]/div/div/div/div/div/div[1]/img/@src')[0] # 提取gif圖片地址
        gifcontent = RequestTools(gifurl) # 請求圖片地址
        title = gifurl.split('/')[-1] # 檔案的存盤名稱
        Save(gifcontent, title)
    except Exception as e:
        print(e)

請求對應的圖片地址
獲取到gif圖片資料
保存對應圖片資訊

def Save(gifcontent, title):
    f = open('./GIF/' + title, 'wb')  # open('檔案路徑', 寫入方式(w 檔案存在就寫入 不存在就創建 b進制檔案讀寫 圖片 16進制資料))
    f.write(gifcontent.content)
    f.close()
    print('{}下載完成...'.format(title))

簡易原始碼分享

import requests
from lxml import etree  # xpath 資料提取


def RequestTools(url):
    # 請求頭 -> 反爬 模擬瀏覽器請求
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.101 Safari/537.36'
    }
    # 發送請求
    response = requests.get(url, headers=headers)
    return response


def Save(gifcontent, title):
    f = open('./GIF/' + title, 'wb')  # open('檔案路徑', 寫入方式(w 檔案存在就寫入 不存在就創建 b進制檔案讀寫 圖片 16進制資料))
    f.write(gifcontent.content)
    f.close()
    print('{}下載完成...'.format(title))


def DateilsPage(url):
    # url = 'http://gifcc.com/thread-5859-1-1.html' # 請求地址
    response = RequestTools(url).text
    html = etree.HTML(response) # HTML物件創建 替換了命名空間
    try:
        gifurl = html.xpath('//td[@class="t_f"]/div[1]/div/div/div/div/div/div[1]/img/@src')[0] # 提取gif圖片地址
        gifcontent = RequestTools(gifurl) # 請求圖片地址
        title = gifurl.split('/')[-1] # 檔案的存盤名稱
        Save(gifcontent, title)
    except Exception as e:
        print(e)


def main(page):
    url = 'http://gifcc.com/forum-38-{}.html'.format(page)
    response = RequestTools(url).text
    html = etree.HTML(response)
    atarget = html.xpath('//div[@class="c cl"]/a/@href')
    for i in atarget:
        urls = 'http://gifcc.com/' + i
        DateilsPage(urls)


# 程式的啟動入口 加密 原始碼加密
if __name__ == '__main__':
    for page in range(1, 11):
        main(page)

我是白又白i，一名喜歡分享知識的程式媛??

如果沒有接觸過編程這塊的朋友看到這篇博客，發現不懂的或想要學習Python的，可以直接留言+私我鴨【非常感謝你的點贊、收藏、關注、評論，一鍵四連支持】

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/293737.html

標籤：python

上一篇：爬蟲120例之第17例，用Python面向物件的思路，采集各種精彩句子

下一篇：【機器人】用Python做一個 “人工智能（障）”機器人 ——不過真的是人工智能嗎？