我有以下資料框:
df <- data.frame(t = c("h","h","h","a","a","h","a","a","h","a","h","a","a"), time = c(1,1,1,1,1,1,1,1,1,1,1,2,2),
key = c("no", "no", "no","yes","no","no","no","no","yes","yes","no","no","no"),
expected = c(-1,-1,-1,1,-1,1,-1,-1,1,1,0,0,0))
t time key expected myTest1 myTest2 myTest3
1 h 1 no -1 -1 1 1
2 h 1 no -1 -1 1 1
3 h 1 no -1 -1 1 1
4 a 1 yes 1 1 1 -1
5 a 1 no -1 1 0 -1
6 h 1 no 1 -1 0 1
7 a 1 no -1 1 0 -1
8 a 1 no -1 1 0 -1
9 h 1 yes 1 -1 0 1
10 a 1 yes 1 1 1 -1
11 h 1 no 0 -1 0 1
12 a 2 no 0 1 0 -1
13 a 2 no 0 1 0 -1
我正在嘗試重新創建一個類似于expected. 按time列分組,第一個條件是在其中有“是”的1每一行中分配。key其他條件是:
- 如果包含下一個 "yes" in 的行
key也包含 "h" in ,則為每個具有 "h"的行和具有 "a" 的行t分配1直到 "yes" 行-1 - 如果包含下一個 "yes" in 的行
key也包含 "a" in ,則為每個具有 "a"的行和具有 "h" 的行t分配1直到 "yes" 行-1 - 如果每個
time部分中沒有更多“是”行,則將 a 分配0給該行
我首先嘗試使用嵌套的 for 回圈:
df$myTest1 <- 0
testIdx <- which(df$key %in% "yes")
df$myTest1[testIdx] <- 1
for (i in 1:length(testIdx)) {
for (j in 1:nrow(df)) {
df$myTest1[j] <- ifelse(df$t[testIdx[i]] == "h" & df$t[j] == "h", 1,
ifelse(df$t[testIdx[i]] == "h" & df$t[j] == "a", -1,
ifelse(df$t[testIdx[i]] == "a" & df$t[j] == "h",
-1, ifelse(df$t[testIdx[i]] == "a" &
df$t[j] == "a", 1, 0))))
}
}
這會得到正確的值,myTest1包括第一個“是”,但在不正確后得到所有行。
我還嘗試了另外兩種方法來創建myTest2和myTest3:
df$myTest2 <- cumsum(c(1, head(df$key == "yes", -1))) %% 2
df <- df %>%
mutate(myTest3 = case_when(t == "h" ~ 1, #add if next "yes" is also "h" condition
t == "a" ~ -1,
TRUE ~ 0))
使用case_when()類似于,ifelse但我不知道如何在沒有 for 回圈的情況下添加其他條件。
為澄清起見,該expected列的讀取方式是這樣的,因為第一個“yes”屬于帶有“a”的行,因此所有之前的“h”行都得到-1,而所有之前的“a”行得到1。下一個“yes”行現在包含“h”,因此“yes”之間的行1對應“h”和-1“a”。第 10 行包含一個“yes”并且也在“yes”之后,所以它只是得到一個1. 第 11 行是最后一個time= 1,后面沒有“是”,所以它被賦值0。= 2時沒有“是”行time,因此那里的所有行也接收0.
uj5u.com熱心網友回復:
這可能會對您有所幫助。
魔術發生在包中的na.locf函式中zoo。
library(magrittr)
library(zoo)
doblock <- function(timeblock) {
yesrows <- which(timeblock$key == "yes")
if (length(yesrows) == 0) {
# no yes rows in timeblock: make all 0
timeblock$exp2 <- 0
} else {
# create a vector of a's and h's against which we need to match the t field
tomatch <- rep(NA, nrow(timeblock))
tomatch[yesrows] <- as.character(timeblock$t)[yesrows]
tomatch <- zoo::na.locf(tomatch, fromLast = TRUE)
# now do the matching
timeblock$exp2 <- 0 # set default as 0 (for those entries after the last 'yes')
timeblock$exp2[1:length(tomatch)] <-
mapply(function(t1, t2) {
if ((t1) == t2) 1 else -1
}, as.character(timeblock$t[1:length(tomatch)]), tomatch)
}
timeblock
}
# split dataframe into blocks for each 'time' and apply function to every time-block
newdf <-
lapply(split(df, df$time), doblock) %>%
do.call(rbind, .)
結果看起來像這樣,exp2上面函式的輸出在哪里。expected與您的領域相匹配。
t time key expected exp2
1.1 h 1 no -1 -1
1.2 h 1 no -1 -1
1.3 h 1 no -1 -1
1.4 a 1 yes 1 1
1.5 a 1 no -1 -1
1.6 h 1 no 1 1
1.7 a 1 no -1 -1
1.8 a 1 no -1 -1
1.9 h 1 yes 1 1
1.10 a 1 yes 1 1
1.11 h 1 no 0 0
2.12 a 2 no 0 0
2.13 a 2 no 0 0
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/427738.html
