python爬取王者榮耀英雄的背景故事-有解無憂

文章目錄

- 前言
- 目標資料源分析
- 代碼實作程序
- - - 1、代碼框架
    - 2、獲取英雄編號及名稱資料
    - 3、獲取英雄故事資料
- 完整代碼

我只用了四個函式，《王者榮耀》就把每個英雄的背景故事遞給了我
在這里插入圖片描述

前言

學習爬蟲，以下內容要學習：

成功安裝了Python環境，這里我使用的是python 3.9
能夠熟練掌握一種IDE，這里我使用的是Pycharm
能夠熟練地安裝第三方庫，如requests庫，但不限于此
能夠掌握一些python的基礎語法知識
能夠養成遇到問題，多思考、多百度的習慣

目標資料源分析

目標地址：

目標地址1：https://pvp.qq.com/web201605/herolist.shtml
目標網址2：https://pvp.qq.com/web201605/herodetail/{英雄編號}.shtml

爬取目標：

全部王者英雄的英雄故事！

用到的基礎庫：

import os
import re  
import bs4
import requests

import chardet  # 可選，可不選
import logging  # 可選，可不選

代碼實作程序

1、代碼框架

先看一下代碼的整體結構：
在這里插入圖片描述
這里我定義了三個全域變數，如果放到主函式里，可以使框架更清晰，

2、獲取英雄編號及名稱資料

首先，進入王者榮耀官網：https://pvp.qq.com/
按照以下步驟打開一個新的頁面，得到第一個目標網址，
在這里插入圖片描述
接著，進行第一個內容的爬取，英雄的名稱和編號：

那我首先要知道，這個東西在哪，對不對？

如圖所示（本來錄的GIF，結果放不出來）：

在這里插入圖片描述
再點擊一下，便可以得到想要的URL

此處代碼需要掌握的知識有：

requests庫，re模塊，正則運算式

import re
import requests
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/'
              '537.36 (KHTML, like Gecko) Chrome/91.0.4472.164 Safari/537.36'
}


#  獲取英雄名稱及對應編號
def get_hero_num(url):
    response = requests.get(url=url, headers=headers).text
    hero_list = re.findall('"ename": (.+?),', response, re.S)  # 得到英雄的編號串列
    hero_name = re.findall('"cname": "(.+?)"', response, re.S)  # 得到英雄的名字串列
    return hero_name, hero_list

def main():
    url = 'https://pvp.qq.com/web201605/js/herolist.json'
    hero_name, hero_list = get_hero_num(url)
    print('英雄名稱為：\n', hero_name)
    print('英雄編號為：\n', hero_list)

在這里插入圖片描述
是可以成功獲取的，

3、獲取英雄故事資料

將英雄的編號，填入目標網址2對應的英雄編號處：
https://pvp.qq.com/web201605/herodetail/{英雄編號}.shtml

然后就訪問這個頁面咯（先用新英雄云纓試一下，對應編號為538）

此處代碼需要掌握的知識有：

requests庫，bs4庫，chardet庫（可選，但建議學一下）

    url = 'https://pvp.qq.com/web201605/herodetail/538.shtml'# 進入英雄詳細頁面的鏈接
    res = requests.get(url=url, headers=headers)
    res.encoding = chardet.detect(res.content)['encoding']  # 統一字符編碼，解決亂碼問題
    res = res.text
    print(res)

你看，這不就得到了嗎
在這里插入圖片描述
下面就是對這部分的資料進行清洗，

也很簡單，利用“美麗的湯”–BeautifulSoup庫，在上述代碼加上這三句：

    soup = bs4.BeautifulSoup(res, 'html.parser')
    story = soup.select('.pop-bd')[0].text 
    print(story)

蕪湖，這樣就可以得到了
在這里插入圖片描述
故事的展現有點問題，但影響不大，一會再優化，

完整代碼

我對上面代碼加了一點點，改動，并沒有一次性爬取所有的英雄的故事，而是根據用戶的輸入進行指定爬取，

貼上結果先，嘻嘻~

在這里插入圖片描述

代碼如下：

# -*- coding: UTF-8 -*-
# @Time: 2021/7/18 18:08
# @Author: 遠方的星
# @CSDN: https://blog.csdn.net/qq_44921056

import os
import re
import bs4
import requests
import chardet
import logging

# 日志輸出的基本配置
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s: %(message)s')

# 創建一個檔案夾
path = './王者故事'
if not os.path.exists(path):
    os.mkdir(path)

headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/'
              '537.36 (KHTML, like Gecko) Chrome/91.0.4472.164 Safari/537.36'
}


#  獲取英雄名稱及對應編號
def get_hero_num(url, hero_dream):
    response = requests.get(url=url, headers=headers).text
    # print(response)
    hero_list = re.findall('"ename": (.+?),', response, re.S)  # 得到英雄的編號串列
    hero_name = re.findall('"cname": "(.+?)"', response, re.S)  # 得到英雄的名字串列
    hero_num = hero_name.index(hero_dream)
    num = hero_list[hero_num]  # 得到英雄序號
    return num


#  根據編號獲取英雄背景故事
def get_story(num):
    url = 'https://pvp.qq.com/web201605/herodetail/{}.shtml'.format(num)  # 進入英雄詳細頁面的鏈接
    res = requests.get(url=url, headers=headers)
    res.encoding = chardet.detect(res.content)['encoding']  # 統一字符編碼，解決亂碼問題
    res = res.text
    soup = bs4.BeautifulSoup(res, 'html.parser')
    story = soup.select('.pop-bd')[0].text  # 虛擬故事段
    story = story.replace(' ', '\n').replace('”', '\n').replace(' ', '')
    story = story.encode(encoding='utf-8')
    return story


#  下載故事
def download(hero_dream, story):  # 下載函式
    file_name = hero_dream+'.txt'
    file_path = path + '/' + file_name
    with open(file_path, 'wb') as f:
        f.write(story)
        logging.info('{}的故事已經下載完成啦！感謝您的使用~')
        f.close()


def main():
    hero_dream = input("請輸入你想查看的英雄故事：")
    url = 'https://pvp.qq.com/web201605/js/herolist.json'
    num = get_hero_num(url, hero_dream)
    story = get_story(num)
    download(hero_dream, story)


if __name__ == '__main__':
    main()

希望能對你有所幫助~~~

所以，我有故事，你有酒嗎？

作者：遠方的星
CSDN：https://blog.csdn.net/qq_44921056
本文僅用于交流學習，未經作者允許，禁止轉載，更勿做其他用途，違者必究，

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/290281.html

標籤：python

上一篇：千萬別再瞎學Python了，過來人的一些學習經驗，能讓你少走很多彎路！

下一篇：【python實戰】B站彈幕是如何看待“法外狂徒張三”的？詞云分析