這是這個問題的后續問題。
概括一下:我有一個data這樣的資料框:
> data
ID measurement_type measurement_time amount entry_time
1 1 type_1 2014-06-17 04:00:00 1 2014-06-17 01:53:00
2 1 type_1 2014-06-17 11:52:00 2 2014-06-17 01:53:00
3 1 type_1 2014-06-17 18:58:00 1 2014-06-17 01:53:00
4 1 type_1 2014-06-18 02:05:00 2 2014-06-17 01:53:00
5 1 type_1 2014-06-18 08:00:00 3 2014-06-17 01:53:00
6 1 type_2 2014-06-17 05:27:00 11 2014-06-17 01:53:00
7 1 type_2 2014-06-17 11:10:00 22 2014-06-17 01:53:00
8 1 type_2 2014-06-17 17:02:00 11 2014-06-17 01:53:00
9 1 type_2 2014-06-17 23:56:00 22 2014-06-17 01:53:00
10 1 type_2 2014-06-18 07:01:00 33 2014-06-17 01:53:00
11 2 type_1 2014-07-03 16:01:00 111 2014-07-03 14:35:00
12 2 type_1 2014-07-03 19:19:00 222 2014-07-03 14:35:00
13 2 type_1 2014-07-03 23:55:00 333 2014-07-03 14:35:00
14 2 type_1 2014-07-04 08:08:00 444 2014-07-03 14:35:00
15 2 type_1 2014-07-04 13:55:00 111 2014-07-03 14:35:00
16 2 type_2 2014-07-03 22:12:00 1111 2014-07-03 14:35:00
17 2 type_2 2014-07-04 08:59:00 2222 2014-07-03 14:35:00
18 2 type_2 2014-07-04 14:10:00 1111 2014-07-03 14:35:00
19 2 type_2 2014-07-04 17:00:00 2222 2014-07-03 14:35:00
20 2 type_2 2014-07-04 23:00:00 3333 2014-07-03 14:35:00
ID 1帶有和ID 2輸入的 Subjects ,entry_time此后, 累積amounts 在特定 處測量measurement_times。但是,每天中午,金額會再次設定為零并重新開始計數(從零開始)。我想要實作的是,一旦中午發生休息(因此重置為零),它會不斷將新的新開始累積量添加到中午之前已經累積的累積量(由分組變數分組measurement_type)。
對于中午休息時間,上述鏈接中提供的答案非常有效:
library(dplyr)
data %>% as_tibble() %>%
# Check 12 hours passed --> `pm` column
mutate(pm = format(measurement_time, "%H") >= 12) %>%
mutate(date_fct = format(measurement_time, "%Y_%d")) %>%
# Group by ID and `pm`
group_by(ID, measurement_type, date_fct, pm) %>%
# Turn cumsum into actual values
mutate(amount_act = amount - lag(amount, default = 0)) %>%
# Cumsum over ID
ungroup() %>%
group_by(ID, measurement_type) %>%
mutate(amount_cums = cumsum(amount_act)) %>%
ungroup() %>%
select(-c(pm, date_fct, amount_act))
# A tibble: 20 x 6
ID measurement_type measurement_time amount entry_time amount_cums
<fct> <fct> <dttm> <dbl> <dttm> <dbl>
1 1 type_1 2014-06-17 04:00:00 1 2014-06-17 01:53:00 1
2 1 type_1 2014-06-17 11:52:00 2 2014-06-17 01:53:00 2
3 1 type_1 2014-06-17 18:58:00 1 2014-06-17 01:53:00 3
4 1 type_1 2014-06-18 02:05:00 2 2014-06-17 01:53:00 5
5 1 type_1 2014-06-18 08:00:00 3 2014-06-17 01:53:00 6
6 1 type_2 2014-06-17 05:27:00 11 2014-06-17 01:53:00 11
7 1 type_2 2014-06-17 11:10:00 22 2014-06-17 01:53:00 22
8 1 type_2 2014-06-17 17:02:00 11 2014-06-17 01:53:00 33
9 1 type_2 2014-06-17 23:56:00 22 2014-06-17 01:53:00 44
10 1 type_2 2014-06-18 07:01:00 33 2014-06-17 01:53:00 77
11 2 type_1 2014-07-03 16:01:00 111 2014-07-03 14:35:00 111
12 2 type_1 2014-07-03 19:19:00 222 2014-07-03 14:35:00 222
13 2 type_1 2014-07-03 23:55:00 333 2014-07-03 14:35:00 333
14 2 type_1 2014-07-04 08:08:00 444 2014-07-03 14:35:00 777
15 2 type_1 2014-07-04 13:55:00 111 2014-07-03 14:35:00 888
16 2 type_2 2014-07-03 22:12:00 1111 2014-07-03 14:35:00 1111
17 2 type_2 2014-07-04 08:59:00 2222 2014-07-03 14:35:00 3333
18 2 type_2 2014-07-04 14:10:00 1111 2014-07-03 14:35:00 4444
19 2 type_2 2014-07-04 17:00:00 2222 2014-07-03 14:35:00 5555
20 2 type_2 2014-07-04 23:00:00 3333 2014-07-03 14:35:00 6666
如您所見,下午計數正確添加到中午之前計數。amount_cums但是,由于每天的分組(date_fct在提供的代碼中),午夜休息會錯誤地將第二天的值(從中午開始累積)添加到前一天的累積量 ( ) 中。
非常感謝任何幫助以獲得以下所需的輸出amount_cums:
# A tibble: 20 x 6
ID measurement_type measurement_time amount entry_time amount_cums
<fct> <fct> <dttm> <dbl> <dttm> <dbl>
1 type_1 2014-06-17 04:00:00 1 2014-06-17 01:53:00 1
1 type_1 2014-06-17 11:52:00 2 2014-06-17 01:53:00 2
1 type_1 2014-06-17 18:58:00 1 2014-06-17 01:53:00 3
1 type_1 2014-06-18 02:05:00 2 2014-06-17 01:53:00 4
1 type_1 2014-06-18 08:00:00 3 2014-06-17 01:53:00 5
1 type_2 2014-06-17 05:27:00 11 2014-06-17 01:53:00 11
1 type_2 2014-06-17 11:10:00 22 2014-06-17 01:53:00 22
1 type_2 2014-06-17 17:02:00 11 2014-06-17 01:53:00 33
1 type_2 2014-06-17 23:56:00 22 2014-06-17 01:53:00 44
1 type_2 2014-06-18 07:01:00 33 2014-06-17 01:53:00 55
2 type_1 2014-07-03 16:01:00 111 2014-07-03 14:35:00 111
2 type_1 2014-07-03 19:19:00 222 2014-07-03 14:35:00 222
2 type_1 2014-07-03 23:55:00 333 2014-07-03 14:35:00 333
2 type_1 2014-07-04 08:08:00 444 2014-07-03 14:35:00 444
2 type_1 2014-07-04 13:55:00 111 2014-07-03 14:35:00 555
2 type_2 2014-07-03 22:12:00 1111 2014-07-03 14:35:00 1111
2 type_2 2014-07-04 08:59:00 2222 2014-07-03 14:35:00 2222
2 type_2 2014-07-04 14:10:00 1111 2014-07-03 14:35:00 3333
2 type_2 2014-07-04 17:00:00 2222 2014-07-03 14:35:00 4444
2 type_2 2014-07-04 23:00:00 3333 2014-07-03 14:35:00 5555
資料
data <- structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor"),
measurement_type = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("type_1", "type_2"), class = "factor"),
measurement_time = structure(c(1402970400, 1402998720, 1403024280, 1403049900, 1403071200, 1402975620, 1402996200, 1403017320, 1403042160, 1403067660,
1404396060, 1404407940, 1404424500, 1404454080, 1404474900, 1404418320, 1404457140, 1404475800, 1404486000, 1404507600), class = c("POSIXct", "POSIXt"), tzone = ""),
amount = c(1, 2, 1, 2, 3, 11, 22, 11, 22, 33, 111, 222, 333, 444, 111, 1111, 2222, 1111, 2222, 3333),
entry_time = structure(c(1402962780, 1402962780, 1402962780, 1402962780, 1402962780,1402962780, 1402962780, 1402962780, 1402962780, 1402962780,
1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900),
class = c("POSIXct", "POSIXt"), tzone = "CET")), class = "data.frame", row.names = c(NA, -20L))
uj5u.com熱心網友回復:
這是一種方法,我確定最近的中午,添加一個幫助列來捕獲每個測量日的最后一次測量,最后將前幾天的最后一次測量添加到每個值。
library(dplyr); library(lubridate)
data %>%
# arrange(ID, measurement_type, measurement_time) %>%
# I needed to adjust the times since they loaded in my local time
mutate(measurement_time = measurement_time dhours(9)) %>%
# identify the most recent noon
mutate(start_of_count_day = floor_date(measurement_time - dhours(12), "day") dhours(12)) %>%
group_by(ID, measurement_type, start_of_count_day) %>%
mutate(day_ttl = if_else(row_number() == max(row_number()), amount, 0)) %>%
group_by(ID, measurement_type) %>%
mutate(cuml = amount cumsum(lag(day_ttl, default = 0))) %>%
ungroup()
結果
# A tibble: 20 × 8
ID measurement_type measurement_time amount entry_time start_of_count_day day_ttl cuml
<fct> <fct> <dttm> <dbl> <dttm> <dttm> <dbl> <dbl>
1 1 type_1 2014-06-17 04:00:00 1 2014-06-16 16:53:00 2014-06-16 12:00:00 0 1
2 1 type_1 2014-06-17 11:52:00 2 2014-06-16 16:53:00 2014-06-16 12:00:00 2 2
3 1 type_1 2014-06-17 18:58:00 1 2014-06-16 16:53:00 2014-06-17 12:00:00 0 3
4 1 type_1 2014-06-18 02:05:00 2 2014-06-16 16:53:00 2014-06-17 12:00:00 0 4
5 1 type_1 2014-06-18 08:00:00 3 2014-06-16 16:53:00 2014-06-17 12:00:00 3 5
6 1 type_2 2014-06-17 05:27:00 11 2014-06-16 16:53:00 2014-06-16 12:00:00 0 11
7 1 type_2 2014-06-17 11:10:00 22 2014-06-16 16:53:00 2014-06-16 12:00:00 22 22
8 1 type_2 2014-06-17 17:02:00 11 2014-06-16 16:53:00 2014-06-17 12:00:00 0 33
9 1 type_2 2014-06-17 23:56:00 22 2014-06-16 16:53:00 2014-06-17 12:00:00 0 44
10 1 type_2 2014-06-18 07:01:00 33 2014-06-16 16:53:00 2014-06-17 12:00:00 33 55
11 2 type_1 2014-07-03 16:01:00 111 2014-07-03 05:35:00 2014-07-03 12:00:00 0 111
12 2 type_1 2014-07-03 19:19:00 222 2014-07-03 05:35:00 2014-07-03 12:00:00 0 222
13 2 type_1 2014-07-03 23:55:00 333 2014-07-03 05:35:00 2014-07-03 12:00:00 0 333
14 2 type_1 2014-07-04 08:08:00 444 2014-07-03 05:35:00 2014-07-03 12:00:00 444 444
15 2 type_1 2014-07-04 13:55:00 111 2014-07-03 05:35:00 2014-07-04 12:00:00 111 555
16 2 type_2 2014-07-03 22:12:00 1111 2014-07-03 05:35:00 2014-07-03 12:00:00 0 1111
17 2 type_2 2014-07-04 08:59:00 2222 2014-07-03 05:35:00 2014-07-03 12:00:00 2222 2222
18 2 type_2 2014-07-04 14:10:00 1111 2014-07-03 05:35:00 2014-07-04 12:00:00 0 3333
19 2 type_2 2014-07-04 17:00:00 2222 2014-07-03 05:35:00 2014-07-04 12:00:00 0 4444
20 2 type_2 2014-07-04 23:00:00 3333 2014-07-03 05:35:00 2014-07-04 12:00:00 3333 5555
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/460015.html
上一篇:在一列中具有相同值的值的交叉刺激
