我正在嘗試向 df 添加一列,該列包含在多行上重復的一對字串的計數。計數需要根據另一列中的更改進行重置。
更具體地說:我正在嘗試將試驗編號添加到一個非常大的資料框中。每個試驗由 2 個部分組成(顯示后跟點),顯示和點分別與一個值相關聯,每次試驗可以有任意數量的顯示/點值。每個 ID 可以有不同的試驗次數,但每次試驗總會有一個節目,后面跟著一個點。這意味著每個 ID 將有不同的行數。
樣本資料:
ID <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2)
TrialType <- c("Show", "Show", "Show", "Point", "Point", "Point", "Point", "Show", "Show", "Show", "Show", "Point", "Show", "Show", "Point", "Show", "Show", "Show", "Point", "Point", "Point", "Show", "Show", "Show", "Show", "Point", "Show", "Show", "Show", "Point", "Point", "Point")
Value <- c(0.52, 0.54, 0.55, 0.57, 0.58, 0.59,0.75,0.89,0.32,0.99,0.01,0.02,0.56,0.67,0.32,0.59,0.75,0.89,0.32,0.99,0.01,0.02,0.56,0.67,0.32,0.55, 0.57, 0.58, 0.59,0.75,0.89, 0.99)
df<-as.data.frame(c(ID, TrialType, Value))
TrialNumber<-c(1,1,1,1,1,1,1,2,2,2,2,2,3,3,3,1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,3)
df.desired <- cbind(ID, TrialType, Value, TrialNumber)
我想我需要一個通過 ID 的回圈,但這太先進了,我無法弄清楚。我是 R 和 stackoverflow 的新手。預先感謝您的幫助。
uj5u.com熱心網友回復:
使用 tidyverse:
在每一項內id
,檢查當前值是否是Point
和之前的值是Show
。如果是這種情況,請開始新的計數。
library(tidyverse)
df %>%
group_by(ID) %>%
mutate(TrialNumber = TrialType == 'Show' &
lag(TrialType, default = 'Point') == 'Point',
TrialNumber = cumsum(TrialNumber))
ID TrialType Value TrialNumber
1 1 Show 0.52 1
2 1 Show 0.54 1
3 1 Show 0.55 1
4 1 Point 0.57 1
5 1 Point 0.58 1
6 1 Point 0.59 1
7 1 Point 0.75 1
8 1 Show 0.89 2
9 1 Show 0.32 2
10 1 Show 0.99 2
11 1 Show 0.01 2
12 1 Point 0.02 2
13 1 Show 0.56 3
14 1 Show 0.67 3
15 1 Point 0.32 3
16 2 Show 0.59 1
17 2 Show 0.75 1
18 2 Show 0.89 1
19 2 Point 0.32 1
20 2 Point 0.99 1
21 2 Point 0.01 1
22 2 Show 0.02 2
23 2 Show 0.56 2
24 2 Show 0.67 2
25 2 Show 0.32 2
26 2 Point 0.55 2
27 2 Show 0.57 3
28 2 Show 0.58 3
29 2 Show 0.59 3
30 2 Point 0.75 3
31 2 Point 0.89 3
32 2 Point 0.99 3
uj5u.com熱心網友回復:
您可以使用rleid
來自data.table
:
library(dplyr)
library(data.table)
df %>%
mutate(tmp = data.table::rleid(TrialType),
tmp = ifelse(TrialType == "Point", tmp - 1, tmp)) %>%
group_by(ID) %>%
mutate(TrialNumber = data.table::rleid(tmp)) %>%
select(-tmp) %>%
ungroup()
這使:
ID TrialType Value TrialNumber
<dbl> <chr> <dbl> <int>
1 1 Show 0.52 1
2 1 Show 0.54 1
3 1 Show 0.55 1
4 1 Point 0.57 1
5 1 Point 0.58 1
6 1 Point 0.59 1
7 1 Point 0.75 1
8 1 Show 0.89 2
9 1 Show 0.32 2
10 1 Show 0.99 2
11 1 Show 0.01 2
12 1 Point 0.02 2
13 1 Show 0.56 3
14 1 Show 0.67 3
15 1 Point 0.32 3
16 2 Show 0.59 1
17 2 Show 0.75 1
18 2 Show 0.89 1
19 2 Point 0.32 1
20 2 Point 0.99 1
21 2 Point 0.01 1
22 2 Show 0.02 2
23 2 Show 0.56 2
24 2 Show 0.67 2
25 2 Show 0.32 2
26 2 Point 0.55 2
27 2 Show 0.57 3
28 2 Show 0.58 3
29 2 Show 0.59 3
30 2 Point 0.75 3
31 2 Point 0.89 3
32 2 Point 0.99 3
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/491081.html