我有兩個如下所示的資料框:
id1 <- c("1", "2", "3", "4", "4")
start <- c("2019-07-09", "2021-03-22", "2019-02-25", "2019-01-27", "2019-02-14")
end <- c("2019-07-10", "2021-03-27", "2019-02-27", "2019-01-28", "2019-02-16")
df1 <- data.frame(id1, start, end)
id2 <- c( "1","1", "2", "2" ,"2" ,"2" , "2", "2", "3","3", "3", "4", "4", "4" )
date <- c("2019-07-09","2019-07-10", "2021-03-22", "2021-03-23", "2021-03-24", "2021-03-25", "2021-03-26", "2021-03-27", "2019-02-25", "2019-02-26", "2019-02-27", "2019-01-27", "2019-02-14", "2019-02-15")
value <- c("0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5")
df2 <- data.frame(id2, date, value)
如果日期落在 df1 中的開始日期和結束日期中,我正在嘗試按 id 匹配并求和 df2 中的值。
輸出應如下所示。我一直在嘗試通過 dplyr 來解決這個問題,但沒有任何運氣。
###EXPECTED OUTPUT####
id1 <- c("1", "2", "3", "4", "4")
start <- c("2019-07-09", "2021-03-22", "2019-02-25", "2019-01-27", "2019-02-14")
end <- c("2019-07-10", "2021-03-27", "2019-02-27", "2019-01-28", "2019-02-16")
output_value <- c("1", "3", "1.5", "0.5", "1")
output <- data.frame(id1, start, end, output_value)
謝謝你的幫助!
uj5u.com熱心網友回復:
您需要首先使用left_join, 組合兩個資料框,而不是確保date、start和end列是“日期”型別,并確保該列value是數字。然后洗掉date不在startand中的那些end。最后group_by是相關的列和sum值。
library(tidyverse)
left_join(df2, df1, by = c("id2" = "id1")) %>%
mutate(across(c(date, start, end), as.Date),
value = as.numeric(value)) %>%
filter(date >= start & date <= end) %>%
group_by(id2, start, end) %>%
summarize(value = sum(value), .groups = "drop")
# A tibble: 5 x 4
id2 start end value
<chr> <date> <date> <dbl>
1 1 2019-07-09 2019-07-10 1
2 2 2021-03-22 2021-03-27 3
3 3 2019-02-25 2019-02-27 1.5
4 4 2019-01-27 2019-01-28 0.5
5 4 2019-02-14 2019-02-16 1
uj5u.com熱心網友回復:
lubridate包的操作%within%符和interval函式對這種事情很有用。你可以這樣做:
library(dplyr)
library(lubridate)
df1 %>%
inner_join(df2, by = c("id1" = "id2")) %>%
mutate(across(c(date, start, end), as.Date),
value = as.numeric(value)) %>%
group_by(id1, start, end) %>%
summarise(output_value = sum(
ifelse(
date %within% interval(start, end),
value,
0
)
))
# A tibble: 5 x 4
# Groups: id1, start [5]
id1 start end output_value
<chr> <date> <date> <dbl>
1 1 2019-07-09 2019-07-10 1
2 2 2021-03-22 2021-03-27 3
3 3 2019-02-25 2019-02-27 1.5
4 4 2019-01-27 2019-01-28 0.5
5 4 2019-02-14 2019-02-16 1
uj5u.com熱心網友回復:
另一種選擇可能是使用該data.table軟體包。以下是獲得所需輸出的快速方法。您確實需要確保將 data.frames 轉換為 data.tables。
library(data.table)
df1 <- data.table(id1 = c("1", "2", "3", "4", "4"),
start = c("2019-07-09", "2021-03-22", "2019-02-25", "2019-01-27", "2019-02-14"),
end = c("2019-07-10", "2021-03-27", "2019-02-27", "2019-01-28", "2019-02-16"))
df2 <- data.table(id2 = c( "1","1", "2", "2" ,"2" ,"2" , "2", "2", "3","3", "3", "4", "4", "4" ),
date = c("2019-07-09","2019-07-10", "2021-03-22", "2021-03-23", "2021-03-24", "2021-03-25",
"2021-03-26", "2021-03-27", "2019-02-25", "2019-02-26", "2019-02-27", "2019-01-27", "2019-02-14", "2019-02-15"),
value = c("0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5"))
# convert value from chr to numeric
df2[,value := as.numeric(value)]
# merge tables by ids
new_dt = merge(df1, df2, by.x = "id1", by.y = "id2", all.y = TRUE)
# sum value when date is between start and end and group by id1,start,end
new_dt2 = new_dt[date
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/429423.html
