我正在查看一些公共交通時間表資料,并試圖找出每輛車在上一站的時間。資料中沒有 vehicle_number,所以我只需要在上一站的資料集中找到當前時間之前的最近時間。
library(tidyverse)
data <- tribble(~stop,~prev_stop,~time,
5,4,10,
6,5,10.1,
7,6,10.2,
9,7,10.3,
5,4,11,
6,5,11.1,
7,6,11.2,
9,7,11.3,
5,4,12,
6,5,12.1,
7,6,12.2,
9,7,12.3)
就像是...
data %>%
mutate(time_at_prev_stop = max(time[stop{in another row) == prev_stop{in current row}] & time {in target row}<time{in current row}))
有什么想法嗎?非常感謝!
uj5u.com熱心網友回復:
您可以自己連接資料,將 prev_stop 與停止匹配,過濾行以便time
方向正確(即先前的停止時間必須小于當前停止時間),然后使用輔助id
列對原始資料行進行分組并選擇最大值
這是一個dplyr
實作,但我會推薦data.table
,因為它支持非等連接和滾動連接
library(dplyr)
left_join(data %>% mutate(id = row_number()),data,by=c("prev_stop" = "stop")) %>%
filter(time.x>time.y | is.na(time.y)) %>%
arrange(id,desc(time.y)) %>%
group_by(id) %>%
slice_head(n=1) %>%
ungroup() %>%
select(stop, prev_stop, time=time.x, time_at_previous_stop = time.y)
輸出:
# A tibble: 12 x 4
stop prev_stop time time_at_previous_stop
<dbl> <dbl> <dbl> <dbl>
1 5 4 10 NA
2 6 5 10.1 10
3 7 6 10.2 10.1
4 9 7 10.3 10.2
5 5 4 11 NA
6 6 5 11.1 11
7 7 6 11.2 11.1
8 9 7 11.3 11.2
9 5 4 12 NA
10 6 5 12.1 12
11 7 6 12.2 12.1
12 9 7 12.3 12.2
使用非 equi 連接的 data.table 實作:
library(data.table)
setDT(data)
data[,`:=`(id=.I,prev_time=time)][data, on=.(stop=prev_stop, prev_time<time)][
,.SD[.N,.(stop=i.stop, time = prev_time, prev_stop=stop, time_at_previous_stop = time)], i.id, keep=F
]
data.table
使用滾動連接的更具可讀性的實作:
d1 = data
d2 = data
setkey(setDT(d1),stop,time)
setkey(setDT(d2),prev_stop, time)
d1[, t:=time][d2,roll= Inf][,.(stop=i.stop, prev_stop=stop,time, time_at_previous_stop=t)]
輸出:
stop prev_stop time time_at_previous_stop
1: 5 4 10.0 NA
2: 5 4 11.0 NA
3: 5 4 12.0 NA
4: 6 5 10.1 10.0
5: 6 5 11.1 11.0
6: 6 5 12.1 12.0
7: 7 6 10.2 10.1
8: 7 6 11.2 11.1
9: 7 6 12.2 12.1
10: 9 7 10.3 10.2
11: 9 7 11.3 11.2
12: 9 7 12.3 12.2
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/536389.html
標籤:r数据表整洁宇宙
上一篇:如何合并具有重復測量的資料集