嗨,我想抓取一個包含 100 行的表,但是使用 rvest 它似乎最多只有 20 行,然后它就會停止。有趣的是,它捕獲了整個表的第一列,但是在第 20 行之后,其余的列都是 NA
library(rvest)
library ( xml2)
html <- rvest::read_html("https://coinmarketcap.com/historical/20150621/")
tables <- html_nodes(html, "table")
df = as.data.frame( rvest:: html_table(tables[[3]], fill = TRUE) )
df = df[ , 1:10]
df[1:25, ]
這是桌子的樣子
> df[1:25, ]
Rank Name Symbol Market Cap Price Circulating Supply Volume (24h) % 1h % 24h % 7d
1 1 BTCBitcoin BTC $3,488,111,052.52 $243.94 14,298,800 BTC $10,600,886.00 -0.09% -0.39% 4.33%
2 2 XRPXRP XRP $329,106,281.79 $0.01031 31,908,551,587 XRP * $564,946.56 0.68% -6.49% 26.52%
3 3 LTCLitecoin LTC $121,255,276.52 $3.02 40,119,404 LTC $3,196,087.25 0.66% -0.02% 50.72%
4 4 DOGEDogecoin DOGE $20,882,626.13 $0.0002091 99,890,370,337 DOGE $345,750.50 0.33% -0.46% 25.29%
5 5 BTSBitShares BTS $19,410,447.59 $0.007727 2,511,953,117 BTS * $66,206.36 -1.53% -3.65% 12.20%
6 6 XLMStellar XLM $17,058,468.94 $0.003526 4,837,354,256 XLM * $25,278.98 -2.85% -4.09% 8.34%
7 7 DASHDash DASH $15,581,959.93 $2.84 5,482,231 DASH $42,407.43 -0.17% -1.17% 1.37%
8 8 NXTNxt NXT $13,625,080.25 $0.01363 999,997,096 NXT * $32,074.26 0.99% -3.74% 15.89%
9 9 BANXBanx BANX $9,648,845.01 $1.64 5,894,665 BANX * $15,804.05 -0.11% -0.41% 4.33%
10 10 PPCPeercoin PPC $8,857,457.26 $0.3949 22,428,765 PPC $63,627.21 -0.46% -5.40% 21.14%
11 11 MAIDMaidSafeCoin MAID $8,112,629.90 $0.01793 452,552,412 MAID * $11,125.53 -0.65% -0.56% 7.06%
12 12 NMCNamecoin NMC $5,681,492.39 $0.4815 11,800,400 NMC $16,962.83 -0.99% -4.69% 43.39%
13 13 BCNBytecoin BCN $5,086,827.18 $0.00002924 173,955,598,772 BCN $5,500.92 0.93% 2.81% 2.53%
14 14 XMRMonero XMR $4,286,720.12 $0.5233 8,192,114 XMR $20,025.62 -1.03% -2.23% 5.73%
15 15 BLKBlackCoin BLK $3,932,944.75 $0.05248 74,938,648 BLK * $212,834.00 1.26% -3.55% 42.16%
16 16 XCPCounterparty XCP $3,358,114.93 $1.27 2,640,365 XCP * $2,235.02 -0.09% 3.81% -6.94%
17 17 VTCVertcoin VTC $3,264,822.95 $0.2048 15,941,100 VTC $35,518.47 -2.41% 2.72% 32.79%
18 18 YBCYbCoin YBC $3,161,465.76 $1.05 3,000,000 YBC * $54,359.75 0.11% 2.52% 15.12%
19 19 MONAMonaCoin MONA $2,993,610.25 $0.1452 20,619,400 MONA $8,199.22 -0.88% 3.92% -8.74%
20 20 UNITYSuperNET UNITY $2,675,341.46 $3.28 816,061 UNITY * $644.62 2.47% -3.88% 16.08%
21 NA BitcoinDark
22 NA NuShares
23 NA Primecoin
24 NA Infinitecoin
25 NA Startcoin
有誰知道發生了什么?
uj5u.com熱心網友回復:
這里的問題是,當您向下滾動頁面時,頁面使用 Javascript 向表格添加行,因此當您使用read_html.
前 200 行資料包含在此標簽內的頁面源代碼中,為 JSON 格式:
<script id="__NEXT_DATA__" type="application/json">
...json here...
</script>
您可以像這樣從那里檢索資料框:
library(rvest)
library(jsonlite)
json_data <- read_html("https://coinmarketcap.com/historical/20150621/") %>%
html_node("#__NEXT_DATA__") %>%
html_text() %>%
fromJSON()
df_data <- json_data$props$initialState$cryptocurrency$listingHistorical$data
dim(df_data)
[1] 200 16
但是該資料框具有您必須處理的嵌套列。
否則,您將需要查看諸如RSelenium 之類的東西來抓取動態內容。
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/381168.html
