我正在尋找一種在 R 中使用累積總和的方法,條件是不包括當前日期。
我有以下資料框(它是真實資料框的子集和簡化版本):
df <- structure(list(date_time = structure(c(1609513200, 1609513200, 1609513200,
1609516800, 1609516800, 1609516800, 1609599600, 1609599600, 1609599600,
1609603200, 1609603200, 1609603200), tzone = "UTC", class = c("POSIXct",
"POSIXt")), event = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L),
person = c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C"),
did_attend = c(1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 1L),
events_attended = c(0, 0, 0, 1, 1, 1, 2, 2, 1, 2, 3, 2),
events_attended_desired = c(0L, 0L, 0L, 0L, 0L, 0L, 2L, 2L, 1L, 2L, 2L, 1L)),
class = c("grouped_df", "tbl_df", "tbl", "data.frame"),
row.names = c(NA, -12L), groups = structure(list(person = c("A", "B", "C"),
.rows = structure(list(c(1L, 4L, 7L, 10L), c(2L, 5L, 8L, 11L),
c(3L, 6L, 9L, 12L)), ptype = integer(0),
class = c("vctrs_list_of", "vctrs_vctr", "list"))),
class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA, -3L), .drop = TRUE))
df
## date_time event person did_attend events_attended events_attended_desired
## 2021-01-01 15:00:00 1 A 1 0 0
## 2021-01-01 15:00:00 1 B 1 0 0
## 2021-01-01 15:00:00 1 C 1 0 0
## 2021-01-01 16:00:00 2 A 1 1 0
## 2021-01-01 16:00:00 2 B 1 1 0
## 2021-01-01 16:00:00 2 C 0 1 0
## 2021-01-02 15:00:00 1 A 0 2 2
## 2021-01-02 15:00:00 1 B 1 2 2
## 2021-01-02 15:00:00 1 C 1 1 1
## 2021-01-02 16:00:00 2 A 1 2 2
## 2021-01-02 16:00:00 2 B 0 3 2
## 2021-01-02 16:00:00 2 C 1 2 1
“did_attend”列是一個虛擬變數,表示一個人是否參加了活動。“events_attended”列顯然是由
events <- events %>%
arrange(date_time) %>%
group_by(person) %>%
mutate(events_attended = lag(cumsum(did_attend), default = 0)) %>%
ungroup()
現在我正在尋找一種不包括當前日期的事件的方法,因此累積總和應該只對當前日期之前的日期求和(所需的輸出在 events_attended_desired 列中)。每天有幾個事件,每天的事件數量不同。所以滯后版本不起作用。我在 cumsum 函式中嘗試了幾個 ifelse() 但它們也不起作用,因為我不知道如何比較 cumsum() 中 ifelse 子句中的日期
uj5u.com熱心網友回復:
這是一種使用dplyrand的方法lubridate::floor_date。
首先,我在資料框中添加了一個“日期”列,以便我可以根據日期進行匯總和連接。
然后我將這個表加入到它自己的一個總結版本中。count(date, wt = did_attend)是 的捷徑group_by(date) %>% summarize(n = sum(did_attend)),所以如果我再考慮它的滯后,我們就會得到想要的結果。
df2 <- df %>%
mutate(date = lubridate::floor_date(date_time, "day"))
df2 %>%
left_join(
df2 %>%
count(date, wt = did_attend) %>%
mutate(prior_attended = cumsum(lag(n, default = 0))) %>%
select(-n)
)
uj5u.com熱心網友回復:
如果每個數字對應于先前日期,則將每個數字乘以 1,否則乘以 0。
library(dplyr)
df %>%
mutate(events_attended = sapply(as.Date(date_time),
function(x) sum((as.Date(date_time) < x) * did_attend))) %>%
arrange(date_time) %>%
ungroup
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/351306.html
