我想從所有員工的官方網頁上抓取期刊文章的標題和作者。例如
https://eps.leeds.ac.uk/civil-engineering/staff/581/samuel-adu-amankwah
我試圖訪問的具體部分是這樣的:

我正在遵循本指南:https://www.datacamp.com/community/tutorials/r-web-scraping-rvest但它指的是本網站沒有的 HTML 標簽。任何人都可以指出我正確的方向嗎?
uj5u.com熱心網友回復:
這是一個 RSelenium 方法
library(RSelenium)
library(rvest)
library(xml2)
#setup driver, client and server
driver <- rsDriver( browser = "firefox", port = 4545L, verbose = FALSE )
server <- driver$server
browser <- driver$client
#goto url in browser
browser$navigate("https://eps.leeds.ac.uk/civil-engineering/staff/581/samuel-adu-amankwah")
#get all tables
doc <- xml2::read_html(browser$getPageSource()[[1]])
df <- data.frame( title =
xml2::xml_find_all(doc, '//span[@]') %>%
xml2::xml_text() )
#close everything down properly
browser$close()
server$stop()
# needed, else the port 4545 stays occupied by the java process
system("taskkill /im java.exe /f", intern = FALSE, ignore.stdout = FALSE)

uj5u.com熱心網友回復:
該頁面使用回傳json物件的 XHR 呼叫動態加載這些參考。在這種情況下,我們可以復制查詢并自己決議 json 以獲取發布串列:
library(httr)
library(rvest)
library(jsonlite)
url <- paste0("https://eps.leeds.ac.uk/site/custom_scripts/symplectic_ajax.php?",
"uniqueid=00970757",
"&tries=0",
"&hash=f6a214dc99686895d6bf3de25507356f",
"&citationStyle=1")
GET(url) %>%
content("text") %>%
fromJSON() %>%
`[[`("publications") %>%
`[[`("journal_article") %>%
lapply(function(x) paste(x$authors, x$title, x$journal, sep = " ; ")) %>%
unlist() %>%
as.character()
#> [1] "Adu-Amankwah S, Zajac M, Skocek J, Nemecek J, Haha MB, Black L ; Combined influence of carbonation and leaching on freeze-thaw resistance of limestone ternary cement concrete ; Construction and Building Materials"
#> [2] "Wang H, Hou P, Li Q, Adu-Amankwah S, Chen H, Xie N, Zhao P, Huang Y, Wang S, Cheng X ; Synergistic effects of supplementary cementitious materials in limestone and calcined clay-replaced slag cement ; Construction and Building Materials"
#> [3] "Shamaki M, Adu-Amankwah S, Black L ; Reuse of UK alum water treatment sludge in cement-based materials ; Construction and Building Materials"
#> [4] "Adu-Amankwah S, Bernal Lopez S, Black L ; Influence of component fineness on hydration and strength development in ternary slag-limestone cements ; RILEM Technical Letters"
#> [5] "Adu-Amankwah S, Zajac M, Skocek J, Ben Haha M, Black L ; Relationship between cement composition and the freeze-thaw resistance of concretes ; Advances in Cement Research"
#> [6] "Zajac M, Skocek J, Adu-Amankwah S, Black L, Ben Haha M ; Impact of microstructure on the performance of composite cements: Why higher total porosity can result in higher strength ; Cement and Concrete Composites"
#> [7] "Adu-Amankwah S, Black L, Skocek J, Ben Haha M, Zajac M ; Effect of sulfate additions on hydration and performance of ternary slag-limestone composite cements ; Construction and Building Materials"
#> [8] "Adu-Amankwah S, Zajac M, Stabler C, Lothenbach B, Black L ; Influence of limestone on the hydration of ternary slag cement ; Cement and Concrete Research"
#> [9] "Adu-Amankwah S, Khatib JM, Searle DE, Black L ; Effect of synthesis parameters on the performance of alkali-activated non-conformant EN 450 pulverised fuel ash ; Construction and Building Materials"
由reprex 包( v2.0.0 )于 2021 年 11 月 4 日創建
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/347491.html
上一篇:如何使用js獲取樣式值
下一篇:如何將我創建的框內的元素居中?
