我試圖從下面的螢屏截圖中抓取“1,335,000”(數字在螢屏截圖的底部)。我在 R 中撰寫了以下代碼。
t2<-read_html("https://fortune.com/company/amazon-com/fortune500/")
employee_number <- t2 %>%
rvest::html_nodes('body') %>%
xml2::xml_find_all("//*[contains(@class, 'info__value--2AHH7')]") %>%
rvest::html_text()
但是,當我呼叫“employee_number”時,它給了我“character(0)”。誰能幫我弄清楚為什么?

uj5u.com熱心網友回復:
由于 Dave2e 指出頁面使用javascript,因此不能使用rvest.
url = "https://fortune.com/company/amazon-com/fortune500/"
#launch browser
library(RSelenium)
driver = rsDriver(browser = c("firefox"))
remDr <- driver[["client"]]
remDr$navigate(url)
remDr$getPageSource()[[1]] %>%
read_html() %>% html_nodes(xpath = '//*[@id="content"]/div[5]/div[1]/div[1]/div[12]/div[2]') %>%
html_text()
[1] "1,335,000"
uj5u.com熱心網友回復:
資料是從script標簽動態加載的。無需花費瀏覽器。您可以提取 中的整個 JavaScript 物件script,將其jsonlite作為 JSON傳遞給處理,然后提取您想要的內容,或者,如果就在員工計數之后,則從回應文本中將其正則運算式。
library(rvest)
library(stringr)
library(magrittr)
library(jsonlite)
page <- read_html('https://fortune.com/company/amazon-com/fortune500/')
data <- page %>% html_element('#preload') %>% html_text() %>%
stringr::str_match(. , "PRELOADED_STATE__ = (.*);") %>% .[, 2] %>% jsonlite::parse_json()
print(data$components$page$`/company/amazon-com/fortune500/`[[6]]$children[[4]]$children[[3]]$config$employees)
#shorter version
print(page %>%html_text() %>% stringr::str_match('"employees":"(\\d )?"') %>% .[,2] %>% as.integer() %>% format(big.mark=","))
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/377790.html
