這個問題是這個問題的后續問題,但每個問題都idPerson可以有多個decision == "d". 有多個idPerson,但一個足以解釋這個問題。idAppt嵌套在idPerson. 考慮這個資料框。
idPerson idAppt decision date
1 A 1 a 2021-09-10
2 A 1 b 2021-09-11
3 A 1 c 2021-09-12
4 A 1 d 2021-09-13
5 A 2 a 2021-09-20
6 A 2 b 2021-09-21
7 A 3 a 2021-09-10
8 A 3 b 2021-09-11
9 A 4 a 2021-09-21
10 A 4 b 2021-09-22
11 A 4 c 2021-09-23
12 A 4 d 2021-09-24
13 A 5 a 2021-09-10
14 A 5 b 2021-09-11
15 A 6 a 2021-10-10
16 A 6 b 2021-10-11
我想構建一個date2回復這些條件的列:
- 對于給定的
idAppt,如果該日期的decision == "a"時間晚于任何其他日期decision == "d",則報告該時間的idPerson最新值(最接近的時間)。例如,在 group中,日期比 group 的日期晚,所以應該是。同樣適用于 group ,但這里有兩個較早(第 4 行和第 12 行)。在這種情況下,應該是最接近的 before ,即。datedecision == "d"idPersonidAppt == 2decision == "a"decision == "d"idAppt == 1date22021-09-13idAppt == 6decision == "d"date22021-10-102021-09-23 - 當給定
decision == "d"的沒有date比dateof更早的時,取給定的最早的。decision == "a"idApptidPerson
這給出了以下所需的輸出:
idPerson idAppt decision date date2
1 A 1 a 2021-09-10 2021-09-10
2 A 1 b 2021-09-11 2021-09-10
3 A 1 c 2021-09-12 2021-09-10
4 A 1 d 2021-09-13 2021-09-10
5 A 2 a 2021-09-20 2021-09-13 #<- correspond to value of row 4
6 A 2 b 2021-09-21 2021-09-13
7 A 3 a 2021-09-10 2021-09-10
8 A 3 b 2021-09-11 2021-09-10
9 A 4 a 2021-09-21 2021-09-13
10 A 4 b 2021-09-22 2021-09-13
11 A 4 c 2021-09-23 2021-09-13
12 A 4 d 2021-09-24 2021-09-13
13 A 5 a 2021-09-11 2021-09-10 #<- earliest value because 2021-09-10 is earlier than 2021-09-13
14 A 5 b 2021-09-12 2021-09-10
15 A 6 a 2021-10-10 2021-09-24 #<- correspond to value of row 12
16 A 6 b 2021-10-11 2021-09-24
資料
df <- structure(list(idPerson = c("A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A"), idAppt = c(1L,
1L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 6L, 6L),
decision = c("a", "b", "c", "d", "a", "b", "a", "b", "a",
"b", "c", "d", "a", "b", "a", "b"), date = structure(c(18880,
18881, 18882, 18883, 18890, 18891, 18880, 18881, 18891, 18892,
18893, 18894, 18881, 18882, 18910, 18911), class = "Date")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -16L))
EO <- structure(list(idPerson = c("A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A"), idAppt = c(1L,
1L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 6L, 6L),
decision = c("a", "b", "c", "d", "a", "b", "a", "b", "a",
"b", "c", "d", "a", "b", "a", "b"), date = structure(c(18880,
18881, 18882, 18883, 18890, 18891, 18880, 18881, 18891, 18892,
18893, 18894, 18881, 18882, 18910, 18911), class = "Date"),
date2 = c("2021-09-10", "2021-09-10", "2021-09-10", "2021-09-10",
"2021-09-13", "2021-09-13", "2021-09-10", "2021-09-10", "2021-09-13",
"2021-09-13", "2021-09-13", "2021-09-13", "2021-09-10", "2021-09-10",
"2021-09-24", "2021-09-24")), row.names = c(NA, -16L), class = c("tbl_df",
"tbl", "data.frame"))
uj5u.com熱心網友回復:
使用data.table滾動連接:
library(data.table)
setDT(df)
# rolling join between decision "d" and "a"
df[decision == "a", date2 := df[decision == "d"][.SD, on = .(idPerson, date), x.date, roll = Inf]]
# set non-matching rows for decision "a" to min(date)
df[decision == "a" & is.na(date2), date2 := min(date), by = idPerson]
# replace other NA by last observation carried forward
setnafill(df, type = "locf", cols = "date2")
idPerson idAppt decision date date2
1: A 1 a 2021-09-10 2021-09-10
2: A 1 b 2021-09-11 2021-09-10
3: A 1 c 2021-09-12 2021-09-10
4: A 1 d 2021-09-13 2021-09-10
5: A 2 a 2021-09-20 2021-09-13
6: A 2 b 2021-09-21 2021-09-13
7: A 3 a 2021-09-10 2021-09-10
8: A 3 b 2021-09-11 2021-09-10
9: A 4 a 2021-09-21 2021-09-13
10: A 4 b 2021-09-22 2021-09-13
11: A 4 c 2021-09-23 2021-09-13
12: A 4 d 2021-09-24 2021-09-13
13: A 5 a 2021-09-11 2021-09-10
14: A 5 b 2021-09-12 2021-09-10
15: A 6 a 2021-10-10 2021-09-24
16: A 6 b 2021-10-11 2021-09-24
'idAppt' 的相關性并不完全清楚,因為日期的比較似乎是在idPerson中執行的。
uj5u.com熱心網友回復:
這是我解決問題的方法,雖然看起來有點復雜:
library(dplyr)
df %>%
group_by(idPerson) %>%
mutate(d_date = list(date[decision == "d"]), min_date_person = min(date)) %>%
group_by(idPerson, idAppt) %>%
mutate(date3 = unlist(map(d_date, \(x){
dates <- date[decision == "a"] - x
w <- which.min(dates[dates > 0])
ifelse(is.null(w), NA, w)
})),
date2 = if_else(is.na(date3), min_date_person, do.call("c", map(d_date, ~ unique(.x[date3]))))) %>%
ungroup() %>%
select(1:4, date2)
# A tibble: 16 × 5
idPerson idAppt decision date date2
<chr> <int> <chr> <date> <date>
1 A 1 a 2021-09-10 2021-09-10
2 A 1 b 2021-09-11 2021-09-10
3 A 1 c 2021-09-12 2021-09-10
4 A 1 d 2021-09-13 2021-09-10
5 A 2 a 2021-09-20 2021-09-13
6 A 2 b 2021-09-21 2021-09-13
7 A 3 a 2021-09-10 2021-09-10
8 A 3 b 2021-09-11 2021-09-10
9 A 4 a 2021-09-21 2021-09-13
10 A 4 b 2021-09-22 2021-09-13
11 A 4 c 2021-09-23 2021-09-13
12 A 4 d 2021-09-24 2021-09-13
13 A 5 a 2021-09-11 2021-09-10
14 A 5 b 2021-09-12 2021-09-10
15 A 6 a 2021-10-10 2021-09-24
16 A 6 b 2021-10-11 2021-09-24
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/526486.html
標籤:r日期分组
上一篇:比較不同日期的兩個日期,回傳具有較少小時和分鐘的特定日期
下一篇:MVC表單資料無法系結到模型
