如何滾動過濾？-有解無憂

如何在滾動的基礎上有條件地過濾/選擇相關觀察？

第 1 至 52 組是基線。

然后在 Groups 53 中，我想過濾掉 Groups 1 到 52 中出現的所有 ID
然后對于第 54 組，我想過濾掉第 2 組到第 53 組中出現的所有 ID
然后對于第 55 組，我想過濾掉從第 3 組到第 54 組出現的所有 ID
等等等等。基本上資料集有組和一個 ID，我正在嘗試選擇相關的 ID。

下面的代碼手動創建了一個示例資料集，其中final_example_data是起始輸出和expected_output預期輸出。

 
example_data <- data.frame(Groups = 1:55,
                           ID = 1)
`%!in%` = Negate(`%in%`)
example_data <-
  example_data %>%
  filter(Groups %in% c(1,4, 7 , 10, 11, 15, 44, 52))
 
example_data2 <- data.frame(Groups = 1:55,
                            ID = 2)
 
example_data2 <-
  example_data2 %>%
  filter(Groups %in% c(1,3,5,7,8,11,15,44,33,55,41))
 
example_data3 <- data.frame(Groups = 1:55,
                            ID = 7)
 
example_data3 <-
  example_data3 %>%
  filter(Groups %in% c(53))
 
example_data4 <-
  data.frame(Groups = 1:55,
             ID = 4) %>%
  filter(Groups == 54)
 
example_data5 <-
  data.frame(Groups = c(1:55), ID = 0) %>%
  filter(Groups %in% c(53,54,55))
 
final_example_data <- rbind(example_data,
                            example_data2,
                            example_data3,
                            example_data4,
                            example_data5)
 
# so this would show that ID 1 is present from Groups 1 to 52, ID 2 is present from Groups 1 to 52, and ID 3 is NOT present from Groups 1 to 52...
 
no_present_in_1_52 <-
  final_example_data %>%
  filter(ID %in% c(7, 0)) %>%
  filter(Groups <= 53)
 
# now which are not present in 2 to 53 but are present in 54
not_present_in_Groups_2_53 <-
  final_example_data %>%
  filter(ID == 4)
 
not_present_in_Groups3_to_54 <-
  final_example_data %>%
  filter(Groups > 54) #but you can see they are present in Groups 3 to 54 visually so they are not included, so nothing for final output for Groups 55
 
expected_output <- rbind(not_present_in_Groups_2_53,no_present_in_1_52)

編輯：

example_data6 <- data.frame(Groups = c(1), ID = 88)
example_data7 <- data.frame(Groups = c(54), ID = 88)

final_example_data <- rbind(final_example_data , example_data6, example_data7)

#So I would expect Groups 54 matched to ID 88 to appear in the results because it was not present in Groups 2 to 53.

uj5u.com熱心網友回復：

為了清楚起見，我重命名final_example_data為fed：

資料表

library(data.table)

setDT(fed)[
  i = Groups>52,
  j = .SD[!ID %in% fed[between(Groups, .BY$Groups-52,.BY$Groups, incbounds=F), ID]],
  by = Groups
]

   Groups ID
1:     53  7
2:     53  0
3:     54  4

或基礎 R

識別超出基線的組值

target_groups = unique(fed$Groups[fed$Groups>52])

回圈遍歷它們，每次檢查該組的 ID 是否在小于該組的任何組的 ID 中；行系結 data.frames 的結果串列

do.call(rbind, (lapply(target_groups, function(x) {
  id <- fed$ID[fed$Groups==x]
  id <- id[!id %in% fed$ID[fed$Groups<x & fed$Groups>(x-52)]]
  if(length(id)>0) return(data.frame(Group = x,ID = id))
})))

輸出：

uj5u.com熱心網友回復：

您可以嘗試這種tidyverse方法 -

library(dplyr)
library(purrr)

baseline <- 52
map_df((baseline   1):max(final_example_data$Groups), ~final_example_data %>%
      filter(!ID %in% ID[Groups < .x], Groups <= .x)) 

#  Groups ID
#1     53  7
#2     53  0
#3     54  4

在哪里

(baseline   1):max(final_example_data$Groups) #returns
#[1] 53 54 55

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/445061.html

標籤：r for循环筛选 tidyverse

上一篇：從pandasDataFrame中的每一列中減去另一個DataFrame中的值

下一篇：如何獲取物件中包含的每個專案的單個字符數