我試圖通過網路抓取從雅虎財經獲取位元幣歷史資料的完整資料集,這是我的第一個選項代碼塊:
library(rvest)
library(tidyverse)
crypto_url <- read_html("https://finance.yahoo.com/quote/BTC-USD/history?period1=1480464000&period2=1638230400&interval=1d&filter=history&frequency=1d&includeAdjustedClose=true")
cryp_table <- html_nodes(crypto_url,css = "table")
cryp_table <- html_table(cryp_table,fill = T) %>%
as.data.frame()
我提供給 read_html() 很長一段時間的鏈接已經被選中,但是它只獲取前 101 行,最后一行是您繼續滾動時收到的加載訊息,這是我的第二個鏡頭,但我得到相同的:
col_page <- read_html("https://finance.yahoo.com/quote/BTC-USD/history?period1=1480464000&period2=1638230400&interval=1d&filter=history&frequency=1d&includeAdjustedClose=true")
cryp_table <-
col_page %>%
html_nodes(xpath = '//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[2]/table') %>%
html_table(fill = T)
cryp_final <- cryp_table[[1]]
我怎樣才能得到整個資料集?
uj5u.com熱心網友回復:
我想你可以得到下載的鏈接,如果你查看網路,你會看到下載的鏈接,在這種情況下:
“https://query1.finance.yahoo.com/v7/finance/download/BTC-USD?period1=1480464000&period2=1638230400&interval=1d&events=history&includeAdjustedClose=true”
嗯,這個鏈接看起來像網站的url,也就是說,我們可以修改url鏈接來獲取下載鏈接并讀取csv。看代碼:
library(stringr)
library(magrittr)
site <- "https://finance.yahoo.com/quote/BTC-USD/history?period1=1480464000&period2=1638230400&interval=1d&filter=history&frequency=1d&includeAdjustedClose=true"
base_download <- "https://query1.finance.yahoo.com/v7/finance/download/"
download_link <- site %>%
stringr::str_remove_all(". (?<=quote/)|/history?|&frequency=1d") %>%
stringr::str_replace("filter", "events") %>%
stringr::str_c(base_download, .)
readr::read_csv(download_link)
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/370856.html
