我有一個縱向資料集,其中每個人都有一個固定時間的“開始”和“結束”觀察日期。它看起來像這樣,例如:
# Populate data frame
person <- c(1, 1, 1, 1, 1)
row <- c(1, 2, 3, 4, 5)
start <- c('2011-01-01', '2011-01-01', '2011-01-01', '2011-01-01', '2011-01-01')
end <- c('2015-12-31', '2015-12-31', '2015-12-31', '2015-12-31', '2015-12-31')
# Bind columns together into a data frame
df <- as.data.frame(cbind(person, row, start, end))
# start and end are date variables
df$start <- as.Date(df$start)
df$end <- as.Date(df$end)
df
#> person row start end
#> 1 1 1 2011-01-01 2015-12-31
#> 2 1 2 2011-01-01 2015-12-31
#> 3 1 3 2011-01-01 2015-12-31
#> 4 1 4 2011-01-01 2015-12-31
#> 5 1 5 2011-01-01 2015-12-31
我想創建一個時變間隔來計算 6 個月的間隔,這些間隔跨越了這些起始邊界的持續時間。問題:如何將這個隨時間變化的區間創建為一組變數?我正在尋找不使用 data.table 的解決方案。它看起來像這樣:
# Populate data frame
person <- c(1, 1, 1, 1, 1)
row <- c(1, 2, 3, 4, 5)
start <- c('2011-01-01', '2011-01-01', '2011-01-01', '2011-01-01', '2011-01-01')
end <- c('2014-12-31', '2014-12-31', '2014-12-31', '2014-12-31', '2014-12-31')
IntervalStart <- c('2011-01-01', '2011-07-02', '2012-07-03', '2013-07-04', '2014-07-05')
IntervalEnd <- c('2011-07-01', '2012-01-02', '2013-01-03', '2014-01-04', '2015-01-05')
# Bind columns together into a data frame
df <- as.data.frame(cbind(person, row, start, end, IntervalStart, IntervalEnd))
# format date variables
df$start <- as.Date(df$start)
df$end <- as.Date(df$end)
df$IntervalStart <- as.Date(df$IntervalStart)
df$IntervalEnd <- as.Date(df$IntervalEnd)
df
#> person row start end IntervalStart IntervalEnd
#> 1 1 1 2011-01-01 2014-12-31 2011-01-01 2011-07-01
#> 2 1 2 2011-01-01 2014-12-31 2011-07-02 2012-01-02
#> 3 1 3 2011-01-01 2014-12-31 2012-07-03 2013-01-03
#> 4 1 4 2011-01-01 2014-12-31 2013-07-04 2014-01-04
#> 5 1 5 2011-01-01 2014-12-31 2014-07-05 2015-01-05
由reprex 包(v2.0.1)于 2022 年 10 月 13 日創建
uj5u.com熱心網友回復:
- 以六個月為間隔創建一個嵌套的開始日期串列
- 取消嵌套串列,以便每個開始日期都有自己的行
- 創建間隔結束日期,即下一個開始日期的前一天;如果所討論的日期是其序列中的最后一個,則將結束日期作為總體結束日期
library(tidyverse)
df %>%
mutate(IntervalStart = map2(start, end, seq, by = '6 months')) %>%
unnest(IntervalStart) %>%
group_by(row) %>%
mutate(IntervalEnd = case_when(is.na(lead(IntervalStart)) ~ end,
TRUE ~ lead(IntervalStart) - 1))
#> # A tibble: 50 x 6
#> # Groups: row [5]
#> person row start end IntervalStart IntervalEnd
#> <fct> <fct> <date> <date> <date> <date>
#> 1 1 1 2011-01-01 2015-12-31 2011-01-01 2011-06-30
#> 2 1 1 2011-01-01 2015-12-31 2011-07-01 2011-12-31
#> 3 1 1 2011-01-01 2015-12-31 2012-01-01 2012-06-30
#> 4 1 1 2011-01-01 2015-12-31 2012-07-01 2012-12-31
#> 5 1 1 2011-01-01 2015-12-31 2013-01-01 2013-06-30
#> 6 1 1 2011-01-01 2015-12-31 2013-07-01 2013-12-31
#> 7 1 1 2011-01-01 2015-12-31 2014-01-01 2014-06-30
#> 8 1 1 2011-01-01 2015-12-31 2014-07-01 2014-12-31
#> 9 1 1 2011-01-01 2015-12-31 2015-01-01 2015-06-30
#> 10 1 1 2011-01-01 2015-12-31 2015-07-01 2015-12-31
#> # ... with 40 more rows
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/514662.html
標籤:r日期间隔
上一篇:谷歌表格公式提取時間簽入和結帳
下一篇:使用開始日期和結束日期延長時間軸
