R-根據最近的匹配列查找和更新-有解無憂

我有一個大資料集：

head(data)

  subject stim1 stim2 Chosen outcome
1       1     2     1      2       0
2       1     3     2      2       0
3       1     3     1      1       0
4       1     2     3      3       1
5       1     1     3      1       1
6       1     2     1      1       1

tail(data)
      subject stim1 stim2 Chosen outcome
44249    3020    40    42     42       0
44250    3020    40    41     41       1
44251    3020    44    45     45       1
44252    3020    41    43     43       0
44253    3020    42    40     42       0
44254    3020    42    44     44       1

我的目標是（在每個主題中）對每一行檢查最近出現的相同的兩個 stim1 和 stim2 的情況，然后添加一列

從該行中選擇的條目 (Previous_Choice)
該行的結果變數 (Previous_outcome)
先前未在該行（即在 Previous_Choice 行中）中選擇的數字是否隨后在導致當前試驗的任何行中被選擇。例如，如果它的 stim1=1 和 stim2=2 和 Chosen=2，那么我正在查看隨后的任何試驗中是否選擇了 Chosen=1（導致我的當前行）（S_choice）（例如，參見第 6 行）

棘手的部分是我不在乎哪個數字是 stim1，哪個數字是 stim2。For example if my current trial stim1=1 and stim2=2 i want the most recent trial where (stim1=1,stim2=2 OR stim1=2, stim2=1)

期望的結果

  subject stim1 stim2 Chosen outcome   Previous_Choice  Previous_Outcome  S_choice 
1       1     2     1      2       0         NA                 NA         NA
2       1     3     2      2       0         NA                 NA         NA
3       1     3     1      1       0         NA                 NA         NA
4       1     2     3      3       1          2                 0        FALSE
5       1     1     3      1       1          1                 0        FALSE
6       1     2     1      1       1          2                 0        TRUE

注意- S_choice 在第 6 行中為真的原因是因為在試驗 1 之后（其中 1 和 2 是 stim1 和 stim2）在第 3 行和第 5 行中選擇了 1

  str(data)
'data.frame':   44254 obs. of  5 variables:
 $ subject: num  1 1 1 1 1 1 1 1 1 1 ...
 $ stim1  : int  2 3 3 2 1 2 2 3 2 2 ...
 $ stim2  : int  1 2 1 3 3 1 3 1 1 1 ...
 $ Chosen : int  2 2 1 3 1 1 2 1 2 2 ...
 $ outcome: int  0 0 0 1 1 1 1 0 1 0 ...

uj5u.com熱心網友回復：

我不明白 S_choise 是什么意思，但也許我可以幫助你處理其他 2 列。

LastOrNa <- function(x) {
  if (length(x) == 0) {
    return(NA)
  }
  return(last(x))
}

LastEq <- function(x, y) {
  res <- sapply(2:length(x), function(t) {
    LastOrNa(which(
        (x[1:(t - 1)] == x[t] & y[1:(t - 1)] == y[t]) |
         (x[1:(t - 1)] == y[t] & y[1:(t - 1)] == x[t])
      ))
    }
  )
  return(c(NA, res))
}

data %>% group_by(subject) %>% 
  mutate(
    last_eq = LastEq(stim1, stim2),
    Previous_Choice = Chosen[last_eq],
    Previous_Outcome = outcome[last_eq],
    last_eq = NULL
  )

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/351657.html

標籤：r 数据库数据框 dplyr

上一篇：lapply如何將列尋址為未知變數？

下一篇：按元素值過濾Python中的串列串列