如何在滾動的基礎上有條件地過濾/選擇相關觀察?
第 1 至 52 組是基線。
- 然后在 Groups 53 中,我想過濾掉 Groups 1 到 52 中出現的所有 ID
- 然后對于第 54 組,我想過濾掉第 2 組到第 53 組中出現的所有 ID
- 然后對于第 55 組,我想過濾掉從第 3 組到第 54 組出現的所有 ID
- 等等等等。基本上資料集有組和一個 ID,我正在嘗試選擇相關的 ID。
下面的代碼手動創建了一個示例資料集,其中final_example_data是起始輸出和expected_output預期輸出。
example_data <- data.frame(Groups = 1:55,
ID = 1)
`%!in%` = Negate(`%in%`)
example_data <-
example_data %>%
filter(Groups %in% c(1,4, 7 , 10, 11, 15, 44, 52))
example_data2 <- data.frame(Groups = 1:55,
ID = 2)
example_data2 <-
example_data2 %>%
filter(Groups %in% c(1,3,5,7,8,11,15,44,33,55,41))
example_data3 <- data.frame(Groups = 1:55,
ID = 7)
example_data3 <-
example_data3 %>%
filter(Groups %in% c(53))
example_data4 <-
data.frame(Groups = 1:55,
ID = 4) %>%
filter(Groups == 54)
example_data5 <-
data.frame(Groups = c(1:55), ID = 0) %>%
filter(Groups %in% c(53,54,55))
final_example_data <- rbind(example_data,
example_data2,
example_data3,
example_data4,
example_data5)
# so this would show that ID 1 is present from Groups 1 to 52, ID 2 is present from Groups 1 to 52, and ID 3 is NOT present from Groups 1 to 52...
no_present_in_1_52 <-
final_example_data %>%
filter(ID %in% c(7, 0)) %>%
filter(Groups <= 53)
# now which are not present in 2 to 53 but are present in 54
not_present_in_Groups_2_53 <-
final_example_data %>%
filter(ID == 4)
not_present_in_Groups3_to_54 <-
final_example_data %>%
filter(Groups > 54) #but you can see they are present in Groups 3 to 54 visually so they are not included, so nothing for final output for Groups 55
expected_output <- rbind(not_present_in_Groups_2_53,no_present_in_1_52)
編輯:
example_data6 <- data.frame(Groups = c(1), ID = 88)
example_data7 <- data.frame(Groups = c(54), ID = 88)
final_example_data <- rbind(final_example_data , example_data6, example_data7)
#So I would expect Groups 54 matched to ID 88 to appear in the results because it was not present in Groups 2 to 53.
uj5u.com熱心網友回復:
為了清楚起見,我重命名final_example_data為fed:
資料表
library(data.table)
setDT(fed)[
i = Groups>52,
j = .SD[!ID %in% fed[between(Groups, .BY$Groups-52,.BY$Groups, incbounds=F), ID]],
by = Groups
]
Groups ID
1: 53 7
2: 53 0
3: 54 4
或基礎 R
- 識別超出基線的組值
target_groups = unique(fed$Groups[fed$Groups>52])
- 回圈遍歷它們,每次檢查該組的 ID 是否在小于該組的任何組的 ID 中;行系結 data.frames 的結果串列
do.call(rbind, (lapply(target_groups, function(x) {
id <- fed$ID[fed$Groups==x]
id <- id[!id %in% fed$ID[fed$Groups<x & fed$Groups>(x-52)]]
if(length(id)>0) return(data.frame(Group = x,ID = id))
})))
輸出:
Group ID
1 53 7
2 53 0
3 54 4
uj5u.com熱心網友回復:
您可以嘗試這種tidyverse方法 -
library(dplyr)
library(purrr)
baseline <- 52
map_df((baseline 1):max(final_example_data$Groups), ~final_example_data %>%
filter(!ID %in% ID[Groups < .x], Groups <= .x))
# Groups ID
#1 53 7
#2 53 0
#3 54 4
在哪里
(baseline 1):max(final_example_data$Groups) #returns
#[1] 53 54 55
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/445061.html
