根據第二列中下一個值的索引為列分配編號-有解無憂

我有以下資料框：

df <- data.frame(t = c("h","h","h","a","a","h","a","a","h","a","h","a","a"), time = c(1,1,1,1,1,1,1,1,1,1,1,2,2), 
                 key = c("no", "no", "no","yes","no","no","no","no","yes","yes","no","no","no"), 
                 expected = c(-1,-1,-1,1,-1,1,-1,-1,1,1,0,0,0))

   t time key expected myTest1 myTest2 myTest3
1  h    1  no       -1      -1       1       1
2  h    1  no       -1      -1       1       1
3  h    1  no       -1      -1       1       1
4  a    1 yes        1       1       1      -1
5  a    1  no       -1       1       0      -1
6  h    1  no        1      -1       0       1
7  a    1  no       -1       1       0      -1
8  a    1  no       -1       1       0      -1
9  h    1 yes        1      -1       0       1
10 a    1 yes        1       1       1      -1
11 h    1  no        0      -1       0       1
12 a    2  no        0       1       0      -1
13 a    2  no        0       1       0      -1

我正在嘗試重新創建一個類似于expected. 按time列分組，第一個條件是在其中有“是”的1每一行中分配。key其他條件是：

如果包含下一個 "yes" in 的行key也包含 "h" in ，則為每個具有 "h"的行和具有 "a" 的行t分配1直到 "yes" 行-1
如果包含下一個 "yes" in 的行key也包含 "a" in ，則為每個具有 "a"的行和具有 "h" 的行t分配1直到 "yes" 行-1
如果每個time部分中沒有更多“是”行，則將 a 分配0給該行

我首先嘗試使用嵌套的 for 回圈：

df$myTest1 <- 0
testIdx <- which(df$key %in% "yes")
df$myTest1[testIdx] <- 1
for (i in 1:length(testIdx)) {
  for (j in 1:nrow(df)) {
    df$myTest1[j] <- ifelse(df$t[testIdx[i]] == "h" & df$t[j] == "h", 1,
                                  ifelse(df$t[testIdx[i]] == "h" & df$t[j] == "a", -1,
                                         ifelse(df$t[testIdx[i]] == "a" & df$t[j] == "h",
                                                -1, ifelse(df$t[testIdx[i]] == "a" & 
                                                             df$t[j] == "a", 1, 0))))
  }
}

這會得到正確的值，myTest1包括第一個“是”，但在不正確后得到所有行。

我還嘗試了另外兩種方法來創建myTest2和myTest3：

df$myTest2 <- cumsum(c(1, head(df$key == "yes", -1))) %% 2

df <- df %>%
  mutate(myTest3 = case_when(t == "h" ~ 1, #add if next "yes" is also "h" condition
                               t == "a" ~ -1,
                               TRUE ~ 0))

使用case_when()類似于，ifelse但我不知道如何在沒有 for 回圈的情況下添加其他條件。

為澄清起見，該expected列的讀取方式是這樣的，因為第一個“yes”屬于帶有“a”的行，因此所有之前的“h”行都得到-1，而所有之前的“a”行得到1。下一個“yes”行現在包含“h”，因此“yes”之間的行1對應“h”和-1“a”。第 10 行包含一個“yes”并且也在“yes”之后，所以它只是得到一個1. 第 11 行是最后一個time= 1，后面沒有“是”，所以它被賦值0。= 2時沒有“是”行time，因此那里的所有行也接收0.

uj5u.com熱心網友回復：

這可能會對您有所幫助。

魔術發生在包中的na.locf函式中zoo。

library(magrittr)
library(zoo)

doblock <- function(timeblock) {
  yesrows <- which(timeblock$key == "yes")
  if (length(yesrows) == 0) {
    # no yes rows in timeblock: make all 0
    timeblock$exp2 <- 0
  } else {
    # create a vector of a's and h's against which we need to match the t field
    tomatch <- rep(NA, nrow(timeblock))
    tomatch[yesrows] <- as.character(timeblock$t)[yesrows]
    
    tomatch <- zoo::na.locf(tomatch, fromLast = TRUE)
    
    # now do the matching
    timeblock$exp2 <- 0    # set default as 0 (for those entries after the last 'yes')
    timeblock$exp2[1:length(tomatch)] <-
      mapply(function(t1, t2) {
        if ((t1) == t2) 1 else -1
      }, as.character(timeblock$t[1:length(tomatch)]), tomatch)
  }
  
  timeblock
} 

# split dataframe into blocks for each 'time' and apply function to every time-block
newdf <- 
  lapply(split(df, df$time), doblock) %>% 
  do.call(rbind, .)

結果看起來像這樣，exp2上面函式的輸出在哪里。expected與您的領域相匹配。

     t time key expected exp2
1.1  h    1  no       -1   -1
1.2  h    1  no       -1   -1
1.3  h    1  no       -1   -1
1.4  a    1 yes        1    1
1.5  a    1  no       -1   -1
1.6  h    1  no        1    1
1.7  a    1  no       -1   -1
1.8  a    1  no       -1   -1
1.9  h    1 yes        1    1
1.10 a    1 yes        1    1
1.11 h    1  no        0    0
2.12 a    2  no        0    0
2.13 a    2  no        0    0

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/427738.html

標籤：r for循环 dplyr 应用

上一篇：只有第一個按鈕在Django回圈中有效

下一篇：如何在java中找到一系列數字的完美平方？