我想用purrr函式標記多元回歸模型中使用的樣本。根據這個問答,我可以臨時實作這個目標,如下所示:
library(dplyr)
df <- mtcars %>%
mutate(disp = replace(hp, c(2, 3), NA)) %>%
mutate(wt = replace(wt, c(3, 4, 5), NA))
s1 <- lm(mpg ~ disp, data = df)
df$samp1 <- TRUE
df$samp1[na.action(s1)] <- FALSE
s2 <- lm(mpg ~ wt, data = df)
df$samp2 <- TRUE
df$samp2[na.action(s2)] <- FALSE
如何添加samp1和samp2對df使用purrr?
uj5u.com熱心網友回復:
應該有一種更整潔的方法來做到這一點across,但它最終可能比它的價值更丑或更令人費解。一個足夠簡單的方法是用你想要的新列的名稱制作一個模型串列,為每個模型創建一個samp*列,然后減少連接到一個資料框。最后一點有效,因為您知道您有所有相同的列要加入。
library(dplyr)
mods <- list(samp1 = s1, samp2 = s2)
df_out <- purrr::imap(mods, function(mod, col) {
df %>%
tibble::rownames_to_column("id") %>%
mutate({{ col }} := id %in% names(na.action(mod)))
}) %>%
purrr::reduce(inner_join)
#> Joining, by = c("id", "mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb")
head(df_out)
#> id mpg cyl disp hp drat wt qsec vs am gear carb samp1
#> 1 Mazda RX4 21.0 6 110 110 3.90 2.620 16.46 0 1 4 4 FALSE
#> 2 Mazda RX4 Wag 21.0 6 NA 110 3.90 2.875 17.02 0 1 4 4 TRUE
#> 3 Datsun 710 22.8 4 NA 93 3.85 NA 18.61 1 1 4 1 TRUE
#> 4 Hornet 4 Drive 21.4 6 110 110 3.08 NA 19.44 1 0 3 1 FALSE
#> 5 Hornet Sportabout 18.7 8 175 175 3.15 NA 17.02 0 0 3 2 FALSE
#> 6 Valiant 18.1 6 105 105 2.76 3.460 20.22 1 0 3 1 FALSE
#> samp2
#> 1 FALSE
#> 2 FALSE
#> 3 TRUE
#> 4 TRUE
#> 5 TRUE
#> 6 FALSE
如果您想走更重的 tidyeval 路線,您可能會找到一些潛在客戶的一些帖子是如何使用 map* 和 mutate 將串列轉換為一組附加列?并將mutate(across(...)) 與 purrr::map 一起使用
uj5u.com熱心網友回復:
我還沒有到那里,但這里有一個使用自定義函式的整潔方法:
flag_use <- function(df, model, name) {
mutate(df, {{name}} := !row_number() %in% na.action( {{model}} ))
}
df %>%
flag_use(s1, "samp1") %>%
flag_use(s2, "samp2")
uj5u.com熱心網友回復:
這看起來太復雜了,但這是我能想到的。(在不將線性模型本身作為管道的一部分運行的情況下,這樣做會更有效率,即僅識別使用了哪些樣本——這可能可以通過model.frame()一些適當的連接來實作......
library(dplyr)
library(purrr)
library(broom)
library(tibble)
## same as before, but also convert rownames to a column
df <- mtcars %>%
mutate(disp = replace(hp, c(2, 3), NA),
wt = replace(wt, c(3, 4, 5), NA)) %>%
rownames_to_column("model")
## (1) set up vector of vars and give it names (for later .id=)
dd <- c("disp", "wt") %>%
setNames(c("samp1", "samp2")) %>%
## (2) construct formulas for lm
map(reformulate, response = "mpg") %>%
## (3) fit the lm
map(lm, data = df) %>%
## (4) generate fitted values
map_dfr(augment, newdata=df, .id="samp") %>%
select(samp, model, .fitted) %>%
## (5) identify which observations were *not* used
mutate(val = !is.na(.fitted)) %>%
## (6) pivot from one long column to two half-length columns
pivot_wider(names_from=samp, values_from=val, id_cols= model) %>%
## (7) add to df
full_join(df, by = "model")
此版本無需運行模型即可完成。
## helper function: returns logical vector of whether observation
## was included in model frame or not
drop_vec <- function(mf) {
nn <- attr(mf, "na.action")
incl <- rep(TRUE, nrow(mf) length(nn))
incl[nn] <- FALSE
incl
}
## first few bits are the same as above
dd <- c("disp", "wt") %>%
setNames(c("samp1", "samp2")) %>%
map(reformulate, response = "mpg") %>%
## only construct model frames - don't run lm()
map(model.frame, data = df) %>%
## apply helper function
map(drop_vec) %>%
## stick them together
bind_cols(df)
我唯一不喜歡這個解決方案的是samp列在開頭結束;將不得不大驚小怪才能將它們作為資料框中的最后一列。
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/325655.html
