我正在嘗試在https://coinmarketcap.com/上抓取硬幣的“最大贏家”串列
如何訪問 div 中的第 n 個孩子(最大贏家)class_ = 'sc-1rmt1nr-0 sc-1rmt1nr-2 iMyvIy'
我設法從“趨勢”部分獲取資料,但在定位“最大贏家”前 3 個文本項時遇到了問題。
我得到屬性錯誤:'NoneType' object has no attribute 'p'
from bs4 import BeautifulSoup
import requests
source = requests.get('https://coinmarketcap.com/').text
soup = BeautifulSoup(source, 'lxml')
section = soup.find(class_='sc-1rmt1nr-0 sc-1rmt1nr-2 iMyvIy')
#List the top 3 Gainers
for top_gainers in section.find_all(class_='sc-16r8icm-0 sc-1uagfi2-0 bdEGog sc-1rmt1nr-1 eCWTbV')[1]:
top_gainers = top_gainers.find(class_='sc-1eb5slv-0 iworPT')
top_coins = top_gainers.p.text
print(top_coins)
uj5u.com熱心網友回復:
我會避免使用這些動態類,而是使用 -:soup-contains 和組合器首先通過文本定位所需的塊,然后使用組合器指定要從中提取資訊的最終元素的關系。
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
soup = bs(requests.get("https://coinmarketcap.com/").text, "lxml")
biggest_gainers = []
for i in soup.select(
'div[color=text]:has(span:-soup-contains("Biggest Gainers")) > div ~ div'
):
biggest_gainers.append(
{
"rank": int(i.select_one(".rank").text),
"currency": i.select_one(".alias").text,
"% change": f"{i.select_one('.icon-Caret-up').next_sibling}",
}
)
gainers = pd.DataFrame(biggest_gainers)
gainers
uj5u.com熱心網友回復:
正如@QHarr 所提到的,您應該避免使用類似于他的選擇所通過的方法:-soup-contains()和元素的已知文本的動態識別符號:
soup.select('div:has(>div>span:-soup-contains("Biggest Gainers")) ~ div')
要提取我使用的文本stripped_strings并將其與 a 的密鑰一起壓縮dict:
dict(zip(['rank','name','alias','change'],e.stripped_strings))
例子
from bs4 import BeautifulSoup
import requests
url = 'https://coinmarketcap.com/'
soup=BeautifulSoup(requests.get(url).content)
data = []
for e in soup.select('div:has(>div>span:-soup-contains("Biggest Gainers")) ~ div'):
data.append(dict(zip(['rank','name','alias','change'],e.stripped_strings)))
輸出
[{'rank': '1', 'name': 'Tenset', 'alias': '10SET', 'change': '1406.99'},
{'rank': '2', 'name': 'Burn To Earn', 'alias': 'BTE', 'change': '348.89'},
{'rank': '3', 'name': 'MetaCars', 'alias': 'MTC', 'change': '332.05'}]
uj5u.com熱心網友回復:
您可以使用:nth-of-type來定位“最大贏家”父級div:
import requests
from bs4 import BeautifulSoup as soup
d = soup(requests.get('https://coinmarketcap.com/').text, 'html.parser')
bg = d.select_one('div:nth-of-type(2).sc-16r8icm-0.sc-1uagfi2-0.bdEGog.sc-1rmt1nr-1.eCWTbV')
data = [{'rank':i.select_one('span.rank').text,
'name':i.select_one('p.sc-1eb5slv-0.iworPT').text,
'change':i.select_one('span.sc-27sy12-0.gLZJFn').text}
for i in bg.select('div.sc-1rmt1nr-0.sc-1rmt1nr-4.eQRTPY')]
輸出:
[{'rank': '1', 'name': 'Tenset', 'change': '1308.72%'}, {'rank': '2', 'name': 'Burn To Earn', 'change': '421.82%'}, {'rank': '3', 'name': 'Aigang', 'change': '329.63%'}]
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/490645.html
上一篇:帶有httr2包的POST請求
