我有一組個人的資料集,該資料集從每個人的不同時間開始收集。
我需要從他們第一次輸入以來的 1 年中對資料進行子集化,如下所示:myData[myDate >= "first entry" & myDate = "1 year"]
示例資料:
df_date <- data.frame( Name = c("Jim","Jim","Jim","Jim","Jim","Jim","Jim","Jim","Jim","Jim","Jim","Jim","Jim","Jim",
"Sue","Sue","Sue","Sue","Sue","Sue","Sue","Sue","Sue","Sue","Sue","Sue","Sue","Sue"),
Dates = c("2010-1-1", "2010-2-2", "2010-3-5","2010-4-17","2010-5-20",
"2010-6-29","2010-7-6","2010-8-9","2010-9-16","2010-10-28","2010-11-16","2010-12-28","2011-1-16","2011-2-28",
"2010-4-1", "2010-5-2", "2010-6-5","2010-7-17","2010-8-20",
"2010-9-29","2010-10-6","2010-11-9","2012-12-16","2011-1-28","2011-2-28","2011-3-28","2011-2-28","2011-3-28"),
Event = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1) )
期望的輸出將是 Jim 將擁有來自1/1/2010 - 12/28/2010
和 Sue 的資料4/4/2010 - 3/28/2011
,依此類推。實際資料集有 > 20 個樣本,所有樣本都在不同時間開始。
uj5u.com熱心網友回復:
使用tidyverse
和lubridate
函式的組合:
library(tidyverse)
library(lubridate)
df_date %>%
mutate(Dates = as_datetime(Dates)) %>%
group_by(Name) %>%
arrange(Dates, .by_group = T) %>%
filter(Dates <= first(Dates) duration(1, units = "year"))
uj5u.com熱心網友回復:
與 Martin C. Arnold 的答案類似,我得到了另一個基于dplyr
and的答案lubridate
。min(Dates) years(1)
意味著在最短日期上增加一年。
library(dplyr)
library(lubridate)
df_date2 <- df_date %>%
mutate(Dates = ymd(Dates)) %>%
group_by(Name) %>%
filter(Dates <= min(Dates) years(1)) %>%
ungroup()
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/507672.html