我正在努力研究如何使用兩列進行合并。我有一個資料框,其中包含有關在某些日期使用了多少調色板的度量。我有另一個包含汽車行駛距離的資料框。然后我需要合并兩者,加入的條件是:汽車和一輛車的距離總和,直到調色板的測量發生的日期。這是一個玩具示例:
#palette measure dataframe
measure = data.frame(car = c("A", "A", "A", "B"), data1 = c("20-09-2020", "15-10-2020", "13-05-2021", "20-10-2021"), palette = c(5,4,3,5))
#> measure
# car data1 palette
#1 A 20-09-2020 5
#2 A 15-10-2020 4
#3 A 13-05-2021 3
#4 B 20-10-2021 5
#the distance dataframe
dist_ = data.frame(car = c("A", "C", "B", "A", "A", "A"), data2 = c("20-09-2020", "14-05-2020", "20-10-2021", "10-01-2021", "11-01-2021", "13-01-2021"), distance = c(10, 20, 10, 5, 3,8))
#> dist_
# car data2 distance
#1 A 20-09-2020 10
#2 C 14-05-2020 20
#3 B 20-10-2021 10
#4 A 10-01-2021 5
#5 A 11-01-2021 3
#6 A 13-01-2021 8
#for result I'd like something like
# car data1 palette distance
#1 A 20-09-2020 5 10
#2 A 15-10-2020 4 0
#3 A 13-05-2020 3 16
#4 B 20-10-2021 5 10
請注意,直到我有一個測量調色板的日期為止,距離是相加的。所以我可以說一輛汽車行駛了16公里,它的調色板是3厘米。
我以為我可以使用類似的東西merge(x = measure, y = dist_, by.x=c("car", "date1"), by.y=c("car", "data2"),all.x = T),但我不知道如何在特定汽車的托盤測量日期之前對距離值求和。
關于我該怎么做的任何提示?
uj5u.com熱心網友回復:
像這樣的事情會起作用:
library(tidyverse)
library(lubridate)
result <- left_join(measure, dist_, by = c("car")) %>%
mutate(across(c("data1", "data2"), dmy)) %>%
filter(data1 >= data2) %>%
group_by(car, data2) %>%
mutate(threshold = min(data1)) %>%
ungroup() %>%
filter(data1 == threshold) %>%
group_by(car, data1, palette)%>%
summarise(distance = sum(distance))
result
# A tibble: 3 x 4
# Groups: car, data1 [3]
car data1 palette distance
<chr> <date> <dbl> <dbl>
1 A 2020-09-20 5 10
2 A 2021-05-13 3 16
3 B 2021-10-20 5 10
如果你想保持不匹配,你可以measure像這樣重新加入:
result.final <- measure %>%
mutate(data1 = dmy(data1))%>%
left_join(result, by = c("car", "data1", "palette"))
result.final
car data1 palette distance
1 A 2020-09-20 5 10
2 A 2020-10-15 4 NA
3 A 2021-05-13 3 16
4 B 2021-10-20 5 10
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/365544.html
上一篇:在Redshift更新查詢中使用左外聯接導致錯誤:目標表必須是等聯接謂詞的一部分
下一篇:LEFT加入具有特定條件的子查詢
