我正在將一些 HTML 元素下載到 R 中
library(RCurl)
curl = getCurlHandle()
curlSetOpt(cookiejar = 'cookies.txt', followlocation = TRUE, autoreferer = TRUE, curl = curl)
html4 <- getURL('http://website/Busqueda_persona.aspx', curl = curl)
viewstate <- as.character(sub('.*id="__VIEWSTATE" value="([0-9a-zA-Z /=]*).*', '\\1', html4))
viewstategenerator <- as.character(sub('.*id="__VIEWSTATEGENERATOR" value="([0-9a-zA-Z /=]*).*', '\\1', html4))
eventvalidation <- as.character(sub('.*id="__EVENTVALIDATION" value="([0-9a-zA-Z /=]*).*', '\\1', html4))
params <- list(
'__VIEWSTATE' = viewstate,
'__VIEWSTATEGENERATOR' = viewstategenerator,
'__EVENTVALIDATION' = eventvalidation,
'ctl00$cphMain$ddlTipoIdentificacion' = "296" ,
'ctl00$cphMain$txtNumeroIdentificacion' = "1109927000",
'ctl00$cphMain$ddlTipoIdentificacionPersonaACargo' = "0",
'ctl00$cphMain$btnBuscar' = "Buscar"
)
html5 = postForm('http://website/Busqueda_persona.aspx', .params = params, curl = curl)
結果的一部分html5包括這個
onclick='javascript:Direccionar(1682000,296,"1109927000",1);'
我需要提取1682000并將其存盤到一個單獨的元素中
EDIT1:嘗試@akrun 建議后,我明白了
sub("\\D (\\d ).*", "\\1", html5)
[1] "3"
attr(,"Content-Type")
charset
"text/html" "utf-8"
我html5在這里上傳了整個R 元素https://controlc.com/b9d622ff
uj5u.com熱心網友回復:
我們可能會使用str_extract從stringr
library(stringr)
str_extract_all(str2, "(?<=onclick\\='javascript\\:Direccionar\\()\\d ")[[1]]
[1] "1682000" "1682000"
或結合使用 parse_number
readr::parse_number(str_extract_all(str1, "onclick='javascript\\:Direccionar\\([0-9] ")[[1]])
[1] 1682000
子串有兩個實體
> substr(str2, 51380, 51418)
[1] "onclick='javascript:Direccionar(1682000"
> substr(str2, 51536, 51574)
[1] "onclick='javascript:Direccionar(1682000"
它被發現 str_locate_all
> str_locate_all(str2, "(?<=onclick='javascript:Direccionar\\()[0-9] ")
[[1]]
start end
[1,] 51412 51418
[2,] 51568 51574
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/327598.html
標籤:r
