我有一個學生資料集,其中包括對正確或錯誤問題的回答。還有一個以秒為單位的時間變數。我想創建一個時間標志來記錄正確和錯誤回應的數量1 minute 2 minute和3 minute閾值。這是一個示例資料集。
df <- data.frame(id = c(1,2,3,4,5),
gender = c("m","f","m","f","m"),
age = c(11,12,12,13,14),
i1 = c(1,0,NA,1,0),
i2 = c(0,1,0,"1]",1),
i3 = c("1]",1,"1]",0,"0]"),
i4 = c(0,"0]",1,1,0),
i5 = c(1,1,NA,"0]","1]"),
i6 = c(0,0,"0]",1,1),
i7 = c(1,"1]",1,0,0),
i8 = c(0,0,0,"1]","1]"),
i9 = c(1,1,1,0,NA),
time = c(115,138,148,195, 225))
> df
id gender age i1 i2 i3 i4 i5 i6 i7 i8 i9 time
1 1 m 11 1 0 1] 0 1 0 1 0 1 115
2 2 f 12 0 1 1 0] 1 0 1] 0 1 138
3 3 m 12 NA 0 1] 1 <NA> 0] 1 0 1 148
4 4 f 13 1 1] 0 1 0] 1 0 1] 0 195
5 5 m 14 0 1 0] 0 1] 1 0 1] NA 225
分鐘閾值由]分數右側的符號表示。
例如對于id = 3,1-minute閾值在 item i3,2-minute閾值在 item i6。每個學生可能有不同的時間閾值。
我需要創建標記變數以按閾值計算正確1-min 2-min和錯誤回應的數量。3-min
我怎樣才能獲得所需的資料集,如下所示。
> df1
id gender age i1 i2 i3 i4 i5 i6 i7 i8 i9 time one_true one_false two_true two_false three_true three_false
1 1 m 11 1 0 1] 0 1 0 1 0 1 115 2 1 NA NA NA NA
2 2 f 12 0 1 1 0] 1 0 1] 0 1 138 2 2 4 3 NA NA
3 3 m 12 NA 0 1] 1 <NA> 0] 1 0 1 148 1 1 2 2 NA NA
4 4 f 13 1 1] 0 1 0] 1 0 1] 0 195 2 0 3 2 5 3
5 5 m 14 0 1 0] 0 1] 1 0 1] NA 225 1 2 2 3 4 4
uj5u.com熱心網友回復:
圖書館(tidyverse)
df %>%
pivot_longer(i1:i9,values_transform = as.character) %>%
group_by(id)%>%
mutate(vs = rev(cumsum(replace_na(str_detect(rev(value),']'),0))))%>%
filter(vs > 0)%>%
mutate(vs = max(vs) - vs 1)%>%
group_by(vs,.add = TRUE)%>%
summarise(true = sum(str_detect(value, '1'), na.rm = TRUE),
false = sum(str_detect(value, '0'), na.rm = TRUE),
.groups = "drop_last")%>%
mutate(across(c(true, false),cumsum)) %>%
pivot_wider(id, names_from = vs, values_from = c(true, false))
# A tibble: 5 x 7
# Groups: id [5]
id true_1 true_2 true_3 false_1 false_2 false_3
<dbl> <int> <int> <int> <int> <int> <int>
1 1 2 NA NA 1 NA NA
2 2 2 4 NA 2 3 NA
3 3 1 2 NA 1 2 NA
4 4 2 3 5 0 2 3
5 5 1 2 4 2 3 4
uj5u.com熱心網友回復:
您也可以在基礎 R 中完成相同的操作:
fun <- function(x){
a <- diff(c(0,which(grepl("]", x))))
f_sum <- function(x,y) sum(na.omit(grepl(x,y)))
fn <- function(x) c(true = f_sum('1',x), false = f_sum('0',x))
y <- tapply(x[seq(sum(a))], rep(seq_along(a),a), fn)
s <- do.call(rbind, Reduce(" ", y, accumulate = TRUE))
nms <- do.call(paste, c(sep='_',expand.grid(colnames(s), seq(nrow(s)))))
setNames(c(t(s)), nms)
}
fun2 <- function(x){
ln <- lengths(x)
nms <- names(x[[which.max(ln)]])
do.call(rbind, lapply(x, function(x)setNames(`length<-`(x,max(ln)),nms)))
}
fun2(apply(df[4:12],1,fun))
true_1 false_1 true_2 false_2 true_3 false_3
[1,] 2 1 NA NA NA NA
[2,] 2 2 4 3 NA NA
[3,] 1 1 2 2 NA NA
[4,] 2 0 3 2 5 3
[5,] 1 2 2 3 4 4
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/529063.html
標籤:r数数
上一篇:運行cor()時找不到物件
