一、分析網站內容

本次爬取網站為opgg，網址為：” http://www.op.gg/champion/statistics”

由網站界面可以看出，右側有英雄的詳細資訊，以Garen為例，勝率為53.84%，選取率為16.99%，常用位置為上單

現對網頁源代碼進行分析（右鍵滑鼠在選單中即可找到查看網頁源代碼），通過查找“53.84%”快速定位Garen所在位置

由代碼可看出，英雄名、勝率及選取率都在td標簽中，而每一個英雄資訊在一個tr標簽中，td父標簽為tr標簽，tr父標簽為tbody標簽，

對tbody標簽進行查找

代碼中共有5個tbody標簽（tbody標簽開頭結尾均有”tbody”，故共有10個”tbody”），對欄位內容分析，分別為上單、打野、中單、ADC、輔助資訊

以上單這部分英雄為例，我們需要首先找到tbody標簽，然后從中找到tr標簽（每一條tr標簽就是一個英雄的資訊），再從子標簽td標簽中獲取英雄的詳細資訊

二、爬取步驟

爬取網站內容->提取所需資訊->輸出英雄資料

getHTMLText(url)->fillHeroInformation(hlist,html)->printHeroInformation(hlist)

getHTMLText(url)函式是回傳url鏈接中的html內容

fillHeroInformation(hlist,html)函式是將html中所需資訊提取出存入hlist串列中

printHeroInformation(hlist)函式是輸出hlist串列中的英雄資訊

三、代碼實作

1、getHTMLText(url)函式

1 def getHTMLText(url): #回傳html檔案資訊
2     try:
3         r = requests.get(url,timeout = 30)
4         r.raise_for_status()
5         r.encoding = r.apparent_encoding
6         return r.text #回傳html內容
7     except:
8         return ""

2、fillHeroInformation(hlist,html)函式

以一個tr標簽為例，tr標簽內有7個td標簽，第4個td標簽內屬性值為"champion-index-table__name"的div標簽內容為英雄名，第5個td標簽內容為勝率，第6個td標簽內容為選取率，將這些資訊存入hlist串列中

1 def fillHeroInformation(hlist,html): #將英雄資訊存入hlist串列
2     soup = BeautifulSoup(html,"html.parser")
3     for tr in soup.find(name = "tbody",attrs = "tabItem champion-trend-tier-TOP").children: #遍歷上單tbody標簽的兒子標簽
4         if isinstance(tr,bs4.element.Tag): #判斷tr是否為標簽型別，去除空行
5             tds = tr('td') #查找tr標簽下的td標簽
6             heroName = tds[3].find(attrs = "champion-index-table__name").string #英雄名
7             winRate = tds[4].string #勝率
8             pickRate = tds[5].string #選取率
9             hlist.append([heroName,winRate,pickRate]) #將英雄資訊添加到hlist串列中

3、printHeroInformation(hlist)函式

1 def printHeroInformation(hlist): #輸出hlist串列資訊
2     print("{:^20}\t{:^20}\t{:^20}\t{:^20}".format("英雄名","勝率","選取率","位置"))
3     for i in range(len(hlist)):
4         i = hlist[i]
5         print("{:^20}\t{:^20}\t{:^20}\t{:^20}".format(i[0],i[1],i[2],"上單"))

4、main()函式

網站地址賦值給url，新建一個hlist串列，呼叫getHTMLText(url)函式獲得html檔案資訊，使用fillHeroInformation(hlist,html)函式將英雄資訊存入hlist串列，再使用printHeroInformation(hlist)函式輸出資訊

1 def main():
2     url = "http://www.op.gg/champion/statistics"
3     hlist = []
4     html = getHTMLText(url) #獲得html檔案資訊
5     fillHeroInformation(hlist,html) #將英雄資訊寫入hlist串列
6     printHeroInformation(hlist) #輸出資訊

四、結果演示

1、網站界面資訊

2、爬取結果

五、完整代碼

 1 import requests #匯入requests庫
 2 import bs4 #匯入bs4庫
 3 from bs4 import BeautifulSoup #匯入BeautifulSoup庫
 4 
 5 def getHTMLText(url): #回傳html檔案資訊
 6     try:
 7         r = requests.get(url,timeout = 30)
 8         r.raise_for_status()
 9         r.encoding = r.apparent_encoding
10         return r.text #回傳html內容
11     except:
12         return ""
13 
14 def fillHeroInformation(hlist,html): #將英雄資訊存入hlist串列
15     soup = BeautifulSoup(html,"html.parser")
16     for tr in soup.find(name = "tbody",attrs = "tabItem champion-trend-tier-TOP").children: #遍歷上單tbody標簽的兒子標簽
17         if isinstance(tr,bs4.element.Tag): #判斷tr是否為標簽型別，去除空行
18             tds = tr('td') #查找tr標簽下的td標簽
19             heroName = tds[3].find(attrs = "champion-index-table__name").string #英雄名
20             winRate = tds[4].string #勝率
21             pickRate = tds[5].string #選取率
22             hlist.append([heroName,winRate,pickRate]) #將英雄資訊添加到hlist串列中
23 
24 def printHeroInformation(hlist): #輸出hlist串列資訊
25     print("{:^20}\t{:^20}\t{:^20}\t{:^20}".format("英雄名","勝率","選取率","位置"))
26     for i in range(len(hlist)):
27         i = hlist[i]
28         print("{:^20}\t{:^20}\t{:^20}\t{:^20}".format(i[0],i[1],i[2],"上單"))
29 
30 def main():
31     url = "http://www.op.gg/champion/statistics"
32     hlist = []
33     html = getHTMLText(url) #獲得html檔案資訊
34     fillHeroInformation(hlist,html) #將英雄資訊寫入hlist串列
35     printHeroInformation(hlist) #輸出資訊
36 
37 main()

如果需要爬取打野、中單、ADC或者輔助資訊，只需要修改

fillHeroInformation(hlist,html)函式中的

for tr in soup.find(name = "tbody",attrs = "tabItem champion-trend-tier-TOP").children陳述句，將attrs屬性值修改為

"tabItem champion-trend-tier-JUNGLE"、"tabItem champion-trend-tier-MID"、"tabItem champion-trend-tier-ADC"、"tabItem champion-trend-tier-SUPPORT"等即可
轉載請宣告原作者并附上原文鏈接！

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/176800.html

標籤：Python

上一篇：第一部分：趣味演算法入門；第四題：抓住交通肇事犯

下一篇：i = i+1 和 i += 1

利用Python爬取OPGG上英雄聯盟英雄勝率及選取率資訊