我是 Python 新手,我正在嘗試制作一個網路爬蟲來獲取 Minecraft 服務器的名稱和 IP。
問題是我能夠獲得的值,但例如服務器的 ip 在 de Im 內部的一個 div 中使用 pandas 和 lxml.html
例子:
<tr>
<td class="server-rank visible-sm visible-md visible-lg">
<p><a href="#1.akumamc.net"><span class="badge">#1</span></a></p>
</td>
<td class="server-name" align="center">
<div class="server-ip input-group">
<p> this is de ip of the server <p> -I WANT TO GET HERE-
</div>
</td>
</tr>
我不知道如何制作 tb 內的 div。我有這個腳本,它是從一個頁面中獲取的,該頁面對其他內容非常有效,但不適用于深入了解內部。
from numpy import tile
import requests
import lxml.html as lh
import pandas as pd
import re
#https://www.servidoresminecraft.info/1.8/
url='https://topminecraftservers.org/version/1.8.8'
#Create a handle, page, to handle the contents of the website
page = requests.get(url)
#Store the contents of the website under doc
doc = lh.fromstring(page.content)
#Parse data that are stored between <tr>..</tr> of HTML
tr_elements = doc.xpath('//tr')
#Check the length of the first 12 rows
[len(T) for T in tr_elements[:5]]
tr_elements = doc.xpath('//tr')
#Create empty list
col=[]
i=0
#For each row, store each first element (header) and an empty list
for t in tr_elements[0]:
i =1
name=t.text_content()
print ('%d:"%s"'%(i,name))
col.append((name,[]))
#Since out first row is the header, data is stored on the second row onwards
for j in range(1,len(tr_elements)):
#T is our j'th row
T=tr_elements[j]
#If row is not of size 10, the //tr data is not from our table
if len(T)!=3:
break
#i is the index of our column
i=0
#Iterate through each element of the row
for t in T.iterchildren():
data=t.text_content()
#Check if row is empty
if i>0:
#Convert any numerical value to integers
try:
if i==2 and j == 1:
print(2)
data=int(data)
except:
pass
#Append the data to the empty list of the i'th column
col[i][1].append(data)
#Increment i for the next column
i =1
[len(C) for (title,C) in col]
Dict={title:column for (title,column) in col}
df=pd.DataFrame(Dict)
print(df.head())
我只想獲取并 aotput 顯示帶有服務器名稱和 ip 的表
Name ip
server1 xxx.xxx.x.x
server2 xxx.xxx.x.x
有什么幫助嗎??
uj5u.com熱心網友回復:
如果我對您的理解正確,這應該可以為您提供所需的內容:
servers = []
cols = ["Name", "ip"]
for s in doc.xpath("//td[@class='server-name']"):
s_ip = s.xpath(".//div[@class='server-ip input-group']//span[@class='form-control text-justify']/text()")[0]
s_name = s.xpath('.//h4/a/span/text()')[0]
servers.append([s_name,s_ip])
pd.DataFrame(servers, columns = cols)
輸出:
Name ip
0 AkumaMC akumamc.net
1 BattleAsya 1.8-1.16 play.battleasya.com
2 Caraotacraft network PRISON caraotacraft.top
3 FlameSquad 87.121.54.214:25568
4 LunixCraft lunixcraft.dk
等等。
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/409873.html
標籤:
上一篇:展平資料框中的JSON列
