我是新來的,想從這個 url 中抓取“歷史資料”表:https : //coinmarketcap.com/currencies/bitcoin/historical-data/
我嘗試使用 bs4 但似乎對我沒有任何作用,因為它只回傳一個空串列......據我所知,我需要做的是在容器中找到所有“tr” - 或者什么?我沒有那么多代碼,但我認為向你展示它是有意義的,所以有一些東西可以使用:
我的代碼:
page = requests.get("https://coinmarketcap.com/currencies/bitcoin/historical-data/")
soup = BeautifulSoup(page.content, 'html.parser')
soup.find_all('tr')
uj5u.com熱心網友回復:
您要查找的資料通過 XHR/Fetch 呼叫添加到頁面中。你可以像下面一樣得到它
import requests
r = requests.get('https://api.coinmarketcap.com/data-api/v3/cryptocurrency/historical?id=1&convertId=2781&timeStart=1633910400&timeEnd=1639180800')
if r.status_code == 200:
print(r.json())
uj5u.com熱心網友回復:
擴展@balderman 的答案,您可以嘗試使用此方法將其正確轉換為 Pandas 資料幀格式:
output = pd.DataFrame(requests.get('https://api.coinmarketcap.com/data-api/v3/cryptocurrency/historical?id=1&convertId=2781&timeStart=1633910400&timeEnd=1639180800').json()['data']['quotes'])
回傳
timeOpen ... quote
0 2021-10-11T00:00:00.000Z ... {'open': 54734.124840616, 'high': 57793.039249...
1 2021-10-12T00:00:00.000Z ... {'open': 57526.8320114193, 'high': 57627.87860...
2 2021-10-13T00:00:00.000Z ... {'open': 56038.2567881108, 'high': 57688.66010...
3 2021-10-14T00:00:00.000Z ... {'open': 57372.8320788954, 'high': 58478.73549...
4 2021-10-15T00:00:00.000Z ... {'open': 57345.9019791856, 'high': 62757.12970...
.. ... ... ...
56 2021-12-06T00:00:00.000Z ... {'open': 49413.4790992129, 'high': 50929.51909...
57 2021-12-07T00:00:00.000Z ... {'open': 50581.8300495181, 'high': 51934.78189...
58 2021-12-08T00:00:00.000Z ... {'open': 50667.6476830609, 'high': 51171.37531...
59 2021-12-09T00:00:00.000Z ... {'open': 50450.0820524109, 'high': 50797.16544...
60 2021-12-10T00:00:00.000Z ... {'open': 47642.1435531841, 'high': 50015.25298...
最后使用一個join()操作,我們可以取消quote包含帶有值的字典的列:
output = output.join(pd.concat([pd.DataFrame([x]) for x in output['quote']]).reset_index(drop=True)).drop(columns='quote')
要以清晰明了的格式獲取它:
timeOpen timeClose timeHigh timeLow open high low close volume marketCap timestamp
0 2021-10-11T00:00:00.000Z 2021-10-11T23:59:59.999Z 2021-10-11T19:47:02.000Z 2021-10-11T00:04:02.000Z 54734.124841 57793.039249 54519.765520 57484.789465 4.263733e 10 1.083079e 12 2021-10-11T23:59:59.999Z
1 2021-10-12T00:00:00.000Z 2021-10-12T23:59:59.999Z 2021-10-12T06:14:02.000Z 2021-10-12T20:09:02.000Z 57526.832011 57627.878602 54477.974468 56041.056838 4.108376e 10 1.055926e 12 2021-10-12T23:59:59.999Z
2 2021-10-13T00:00:00.000Z 2021-10-13T23:59:59.999Z 2021-10-13T21:43:02.000Z 2021-10-13T09:10:02.000Z 56038.256788 57688.660104 54370.973228 57401.097527 4.168425e 10 1.081612e 12 2021-10-13T23:59:59.999Z
3 2021-10-14T00:00:00.000Z 2021-10-14T23:59:59.999Z 2021-10-14T02:27:02.000Z 2021-10-14T18:30:02.000Z 57372.832079 58478.735499 56957.076136 57321.525280 3.661579e 10 1.080160e 12 2021-10-14T23:59:59.999Z
4 2021-10-15T00:00:00.000Z 2021-10-15T23:59:59.999Z 2021-10-15T20:28:02.000Z 2021-10-15T01:20:02.000Z 57345.901979 62757.129703 56868.142693 61593.950061 5.178008e 10 1.160726e 12 2021-10-15T23:59:59.999Z
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/382743.html
上一篇:使用beautifulsoup找到帶有部分字串的ap標簽,并提取后面的p標簽的字串中的整數
下一篇:WebScrape-獲取href
