我在 R 中有下表,其中列出了人種、性別、年齡和膽固醇測驗。年齡和膽固醇測驗顯示為虛擬變數。年齡可分為低、中或高,而膽固醇測驗可分為低或高。我想將年齡和膽固醇列轉換為單列,其中低被歸類為 1,中被歸類為 2,高被歸類為 3。如果一個人從未服用過膽固醇測驗,膽固醇測驗可以是低或高,應該是預期輸出中不適用。我希望解決方案是動態的,這樣如果我有這種格式的多個列,代碼仍然可以作業(即可能有一些新的測驗,可以將它們歸類為高、低或中作為虛擬變數)。
我怎樣才能在 R 中做到這一點?
輸入:
race gender age.low_tm1 age.medium_tm1 age.high_tm1 chol_test.low_tm1 chol_test.high_tm1
<chr> <int> <int> <int> <int> <int> <int>
1 white 0 1 0 0 0 0
2 white 0 1 0 0 0 0
3 white 1 1 0 0 0 0
4 black 1 0 1 0 0 0
5 white 0 0 0 1 0 1
6 black 0 0 1 0 1 0
預期輸出:
race gender age chol_test
1 white 0 1 n/a
2 white 0 1 n/a
3 white 1 1 n/a
4 black 1 2 n/a
5 white 0 3 3
6 black 0 2 1
uj5u.com熱心網友回復:
也許這有幫助
library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
mutate(across(contains("_"), ~
. * setNames(1:3, c("low", "medium", "high"))[
str_extract(cur_column(), "low|medium|high")])) %>%
rename_with(~ str_remove(., "_tm1")) %>%
pivot_longer(cols = -c(race, gender),
names_to = c(".value", "categ"), names_sep = "\\.") %>%
filter(age > 0|chol_test > 0) %>%
select(-categ) %>%
mutate(chol_test = na_if(chol_test, 0))
-輸出
# A tibble: 7 × 4
race gender age chol_test
<chr> <int> <int> <int>
1 white 0 1 NA
2 white 0 1 NA
3 white 1 1 NA
4 black 1 2 NA
5 white 0 3 3
6 black 0 0 1
7 black 0 2 NA
資料
df1 <- structure(list(race = c("white", "white", "white", "black", "white",
"black"), gender = c(0L, 0L, 1L, 1L, 0L, 0L), age.low_tm1 = c(1L,
1L, 1L, 0L, 0L, 0L), age.medium_tm1 = c(0L, 0L, 0L, 1L, 0L, 1L
), age.high_tm1 = c(0L, 0L, 0L, 0L, 1L, 0L), chol_test.low_tm1 = c(0L,
0L, 0L, 0L, 0L, 1L), chol_test.high_tm1 = c(0L, 0L, 0L, 0L, 1L,
0L)), class = "data.frame", row.names = c("1", "2", "3", "4",
"5", "6"))
uj5u.com熱心網友回復:
我們可以首先定義一個自定義函式,該函式允許我們根據變數名稱重新編碼虛擬變數,下面稱為var_nm2value.
然后我們可以在下面定義一個重新編碼值串列recode_ls。
最后,我們用purrr::map_dfc在dplyr::summarise這里我們使用變數字串我們要建立"age"和"chol_test",然后II)select只包含這個字串列,并在每次迭代中,我們三)申請dplyr::across重新編碼值,IV)管,結果在do.call得到在max最后五)重新編碼0s到NA:
# custom function to recode 0/1 dummy variables based on their variable name an
var_nm2value <- function(x, values_ls) {
for (val in seq_along(values_ls)) {
if(grepl(names(values_ls)[val], dplyr::cur_column())) {
return(ifelse(x == 1L, values_ls[[val]], x))
}
}
}
# define list of recode values
recode_ls <- list(low = 1, medium = 2, high = 3)
library(tidyverse)
# apply functions to data.frame
df1 %>%
summarise(race = race,
gender = gender,
map_dfc(set_names(c("age", "chol_test")), # i)
function(x) {
select(., contains(x)) %>% # ii)
summarise("{x}" := across(everything(), var_nm2value, recode_ls) %>% # iii)
do.call("pmax", .) %>% # iv)
ifelse(. == 0, NA, .))} # v)
))
#> race gender age chol_test
#> 1 white 0 1 NA
#> 2 white 0 1 NA
#> 3 white 1 1 NA
#> 4 black 1 2 NA
#> 5 white 0 3 3
#> 6 black 0 2 1
由reprex 包(v0.3.0)于 2022 年 1 月 3 日創建
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/403222.html
標籤:
上一篇:httrGET函式超時
