我的面板資料由兩波組成:18 和 21。我的就業狀態有 4 個值。
如果此人在兩個波中都受雇,我想創建一個取值為 1 的虛擬變數,否則創建一個取值為 1 的虛擬變數。但是,我失敗的代碼會產生一個只有零值的虛擬物件:
df$dummy <- df %>%
group_by(NEW_id) %>%
arrange(New_id, WAVE_NO) %>%
mutate(dummy = case_when(WAVE_NO==18 & WAVE_NO==21 & EMPLOYMENT_STATUS=="Employed" ~ 1, TRUE ~ 0))

uj5u.com熱心網友回復:
我們可以用split來分割資料幀id。作為split回傳一個串列,我們可以使用lapply對該串列的每個元素執行一些操作(這里:創建虛擬變數)。的輸出也lapply將是一個串列。但是,我們想要一個data.frame,所以我們呼叫do.call(),它一次對串列的所有元素執行一些操作(這里:rbind)。
set.seed(1)
n <- 10L
K <- 2L
df <- data.frame(
id = rep(1L:n, each=K),
wave = rep(c(18L,21L), n),
employment = sample(c('Employed', 'Unemployed'), n*K, replace = TRUE)
)
# add dummy to data frame
df <- do.call(rbind, lapply(split(df, df$id), function(x) {
x$dummy <- ifelse(x$employment %in% 'Employed', 1L, 0L)
x$dummy <- ifelse(sum(x$dummy) == 2L, 1L, 0L)
return(x)
}))
rownames(df) <- NULL
輸出
> head(df)
id wave employment dummy
1 1 18 Employed 0
2 1 21 Unemployed 0
3 2 18 Employed 1
4 2 21 Employed 1
5 3 18 Unemployed 0
6 3 21 Employed 0
uj5u.com熱心網友回復:
df <- data.frame(
stringsAsFactors = FALSE,
id = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L),
wave = c(18L, 21L, 18L, 21L, 18L, 21L, 18L, 10L, 18L, 21L),
EMPLOYMENT_STATUS = c(
"Employed",
"Employed",
"unemployed",
"Employed",
"unemployed",
"Employed",
"Employed",
"Employed",
"unemployed",
"unemployed"
)
)
library(tidyverse)
df %>%
group_by(id) %>%
mutate(dummy = (all(wave %in% c(18, 21)) &
all(EMPLOYMENT_STATUS == "Employed"))) %>%
ungroup()
#> # A tibble: 10 x 4
#> id wave EMPLOYMENT_STATUS dummy
#> <int> <int> <chr> <int>
#> 1 1 18 Employed 1
#> 2 1 21 Employed 1
#> 3 2 18 unemployed 0
#> 4 2 21 Employed 0
#> 5 3 18 unemployed 0
#> 6 3 21 Employed 0
#> 7 4 18 Employed 0
#> 8 4 10 Employed 0
#> 9 5 18 unemployed 0
#> 10 5 21 unemployed 0
由reprex 包于 2022-01-23 創建(v2.0.1)
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/420144.html
標籤:
