使用purrr標記許多模型的回歸樣本-有解無憂

我想用purrr函式標記多元回歸模型中使用的樣本。根據這個問答，我可以臨時實作這個目標，如下所示：

library(dplyr)

df <- mtcars %>% 
  mutate(disp = replace(hp, c(2, 3), NA)) %>% 
  mutate(wt = replace(wt, c(3, 4, 5), NA))

s1 <- lm(mpg ~ disp, data = df)
df$samp1 <- TRUE
df$samp1[na.action(s1)] <- FALSE         

s2 <- lm(mpg ~ wt, data = df)
df$samp2 <- TRUE
df$samp2[na.action(s2)] <- FALSE

如何添加samp1和samp2對df使用purrr？

uj5u.com熱心網友回復：

應該有一種更整潔的方法來做到這一點across，但它最終可能比它的價值更丑或更令人費解。一個足夠簡單的方法是用你想要的新列的名稱制作一個模型串列，為每個模型創建一個samp*列，然后減少連接到一個資料框。最后一點有效，因為您知道您有所有相同的列要加入。

library(dplyr)

mods <- list(samp1 = s1, samp2 = s2)

df_out <- purrr::imap(mods, function(mod, col) {
  df %>%
    tibble::rownames_to_column("id") %>%
    mutate({{ col }} := id %in% names(na.action(mod)))
}) %>%
  purrr::reduce(inner_join)
#> Joining, by = c("id", "mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb")

head(df_out)
#>                  id  mpg cyl disp  hp drat    wt  qsec vs am gear carb samp1
#> 1         Mazda RX4 21.0   6  110 110 3.90 2.620 16.46  0  1    4    4 FALSE
#> 2     Mazda RX4 Wag 21.0   6   NA 110 3.90 2.875 17.02  0  1    4    4  TRUE
#> 3        Datsun 710 22.8   4   NA  93 3.85    NA 18.61  1  1    4    1  TRUE
#> 4    Hornet 4 Drive 21.4   6  110 110 3.08    NA 19.44  1  0    3    1 FALSE
#> 5 Hornet Sportabout 18.7   8  175 175 3.15    NA 17.02  0  0    3    2 FALSE
#> 6           Valiant 18.1   6  105 105 2.76 3.460 20.22  1  0    3    1 FALSE
#>   samp2
#> 1 FALSE
#> 2 FALSE
#> 3  TRUE
#> 4  TRUE
#> 5  TRUE
#> 6 FALSE

如果您想走更重的 tidyeval 路線，您可能會找到一些潛在客戶的一些帖子是如何使用 map* 和 mutate 將串列轉換為一組附加列？并將mutate(across(...)) 與 purrr::map 一起使用

uj5u.com熱心網友回復：

我還沒有到那里，但這里有一個使用自定義函式的整潔方法：

flag_use <- function(df, model, name) {
  mutate(df, {{name}} := !row_number() %in% na.action( {{model}} ))
}

df %>%
  flag_use(s1, "samp1") %>%
  flag_use(s2, "samp2")

uj5u.com熱心網友回復：

這看起來太復雜了，但這是我能想到的。（在不將線性模型本身作為管道的一部分運行的情況下，這樣做會更有效率，即僅識別使用了哪些樣本——這可能可以通過model.frame()一些適當的連接來實作......

library(dplyr)
library(purrr)
library(broom)
library(tibble)

## same as before, but also convert rownames to a column
df <- mtcars %>%
  mutate(disp = replace(hp, c(2, 3), NA),
         wt = replace(wt, c(3, 4, 5), NA)) %>%
  rownames_to_column("model")

## (1) set up vector of vars and give it names (for later .id=)
dd <- c("disp", "wt") %>%
  setNames(c("samp1", "samp2")) %>%
## (2) construct formulas for lm
  map(reformulate, response = "mpg") %>%
## (3) fit the lm
  map(lm, data = df) %>%
## (4) generate fitted values
  map_dfr(augment, newdata=df, .id="samp") %>%
  select(samp, model, .fitted) %>%
## (5) identify which observations were *not* used
  mutate(val = !is.na(.fitted)) %>%
## (6) pivot from one long column to two half-length columns
  pivot_wider(names_from=samp, values_from=val, id_cols= model) %>%
## (7) add to df
  full_join(df, by = "model")

此版本無需運行模型即可完成。

## helper function: returns logical vector of whether observation
## was included in model frame or not
drop_vec <- function(mf) {
  nn <- attr(mf, "na.action")
  incl <- rep(TRUE, nrow(mf)   length(nn))
  incl[nn] <- FALSE
  incl
}

## first few bits are the same as above
dd <- c("disp", "wt") %>%
  setNames(c("samp1", "samp2")) %>%
  map(reformulate, response = "mpg") %>%
## only construct model frames - don't run lm()
  map(model.frame, data = df) %>%
## apply helper function
  map(drop_vec) %>%
## stick them together
  bind_cols(df)

我唯一不喜歡這個解決方案的是samp列在開頭結束；將不得不大驚小怪才能將它們作為資料框中的最后一列。

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/325655.html

標籤：r 回归线性回归咕噜噜样本

上一篇：詳解python三大器——迭代器、生成器、裝飾器

下一篇：用多列中的多個值替換多個字符？電阻