我在 R 中有一個大型資料框,其中包含 200 多個主要是字符變數,我想為其添加因子。我在單獨的資料框中準備了所有級別和標簽。對于某個變數Var1,對應的級別和標簽是Var1_vand Var1_b,例如對于變數Gender,級別和標簽被命名為Gender_vand Gender_l。
這是我的資料示例:
df <- data.frame (Gender = c("2","2","1","2"),
AgeG = c("3","1","4","2"))
fct <- data.frame (Gender_v = c("1", "2"),
Gender_b = c("Male", "Female"),
AgeG_v = c("1","2","3","4"),
AgeG_b = c("<25","25-60","65-80",">80"))
df$Gender <- factor(df$Gender, levels = fct$Gender_v, labels = fct$Gender_b, exclude = NULL)
df$AgeG <- factor(df$AgeG, levels = fct$AgeG_v, labels = fct$AgeG_b, exclude = NULL)
是否可以使流程自動化,以便將因素(級別和標簽)應用于相應的變數,而無需我單獨完成每一個變數?我認為這是通過一個帶有pmap.
我的目標是盡量減少此程序所需的作業量。還有更好的方法來準備標簽和級別嗎?
非常感謝您的幫助。
uj5u.com熱心網友回復:
我通過簡單的代碼重構解決了這個問題,自動回圈思考。添加的資料越多,花費的時間就越多。我相信這 fct[[paste0(names(df[i]),"_v")]]可以在一個小函式中重構,看起來更好
> df <- data.frame (Gender = c("2","2","1","2"),
AgeG = c("3","1","4","2"))
>
> fct <- data.frame (Gender_v = c("1", "2"),
Gender_b = c("Male", "Female"),
AgeG_v = c("1","2","3","4"),
AgeG_b = c("<25","25-60","65-80",">80"))
>
> for(i in 1:ncol(df)){
le <- fct[[paste0(names(df[i]),"_v")]]
la <- fct[[paste0(names(df[i]),"_b")]]
df[,i] <- factor(df[,i],levels = le ,labels = la,exclude = NULL)
}
>
> df
Gender AgeG
1 Female 65-80
2 Female <25
3 Male >80
4 Female 25-60
>
編輯:這是添加的 if 條件
> df <- data.frame (Gender_f = c("2","2","1","2"),
AgeG_f = c("3","1","4","2"),
AgeN = c(70,15,96,30))
>
> fct <- data.frame (Gender_v = c("1", "2"),
Gender_b = c("Male", "Female"),
AgeG_v = c("1","2","3","4"),
AgeG_b = c("<25","25-60","65-80",">80"))
>
> for(i in 1:ncol(df)){
if(endsWith(names(df[i]),"_f")){
name <- str_remove(names(df[i]),"_f")
le <- fct[[paste0(name,"_v")]]
la <- fct[[paste0(name,"_b")]]
df[,i] <- factor(df[,i],levels = le ,labels = la,exclude = NULL)
}
}
>
> df
Gender_f AgeG_f AgeN
1 Female 65-80 70
2 Female <25 15
3 Male >80 96
4 Female 25-60 30
>
uj5u.com熱心網友回復:
資料框并不是真正適合存盤因子級別定義的資料結構:沒有理由期望所有因子都具有相同數量的級別。相反,我只使用一個簡單的串列,并將級別資訊更緊湊地存盤為命名向量,如下所示:
df <- data.frame(
Gender = c("2", "2", "1", "2"),
AgeG = c("3", "1", "4", "2")
)
value_labels <- list(
Gender = c("Male" = 1, "Female" = 2),
AgeG = c("<25" = 1, "25-60" = 2, "65-80" = 3, ">80" = 4)
)
然后,您可以創建一個使用該資料結構在資料框中生成因子的函式:
make_factors <- function(data, value_labels) {
for (var in names(value_labels)) {
if (var %in% colnames(data)) {
vl <- value_labels[[var]]
data[[var]] <- factor(
data[[var]],
levels = unname(vl),
labels = names(vl)
)
}
}
data
}
make_factors(df, value_labels)
#> Gender AgeG
#> 1 Female 65-80
#> 2 Female <25
#> 3 Male >80
#> 4 Female 25-60
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/418846.html
標籤:
