假設我有以下表格(注意:日期在我的問題中顯示為因素):
table_1 = data.frame(id = c("123", "123", "125", "125"),
date_1 = c("2010-01-31","2010-01-31", "2015-01-31", "2018-01-31" ))
table_1$id = as.factor(table_1$id)
table_1$date_1 = as.factor(table_1$date_1)
table_2 = data.frame(id = c("123", "121", "125", "126"),
date_2 = c("2009-01-31","2010-01-31", "2010-01-31", "2010-01-31" ),
date_3 = c("2011-01-31","2010-01-31", "2020-01-31", "2020-01-31" ))
table_2$id = as.factor(table_2$id)
table_2$date_2 = as.factor(table_2$date_2)
table_2$date_3 = as.factor(table_2$date_3)
我想使用以下條件在這兩個表上執行(某種型別的)“連接”(現在無關緊要,例如右連接、內部連接等):
1)如果 table_1$id = table_2$id
和
2)如果 table_1$date BETWEEN(table_2$date_2,table_2$date_3)
我在 Stackoverflow 上發現了一個以前的問題,它演示了如何使用“SQLDF”庫來做到這一點:r merge by id and date between two date
library(sqldf)
final = sqldf("select a.*, b.*
from table_1 a left join table_2 b
on a.id = b.id and
a.date_1 between
b.date_2 and
b.date_3")
head(final)
#for some reason, this produces duplicate rows, I don't know why
id date_1 id date_2 date_3
1 123 2010-01-31 123 2009-01-31 2011-01-31
2 123 2010-01-31 123 2009-01-31 2011-01-31
3 125 2015-01-31 125 2010-01-31 2020-01-31
4 125 2018-01-31 125 2010-01-31 2020-01-31
#optional: remove duplicates
final_no_dup <- final[!duplicated(final$id),]
我的問題:有沒有辦法使用 Base R 執行上述“連接”?如果這在 Base R 中是不可能的,可以在“dplyr”中完成嗎?
uj5u.com熱心網友回復:
你可以試試這種方式 dplyr
table_1 %>%
left_join(table_2, by = "id") %>%
mutate(across(2:4, ~as.Date(.x))) %>%
filter(date_1 <= max(date_3, date_2), date_1 >= min(date_2, date_3)) %>%
distinct()
id date_1 date_2 date_3
1 123 2010-01-31 2009-01-31 2011-01-31
2 125 2015-01-31 2010-01-31 2020-01-31
3 125 2018-01-31 2010-01-31 2020-01-31
根據 R
table_3 <- merge(x = table_1, y = table_2, by = "id", all.x = TRUE)
table_3 <- table_3[table_3$date_1 <= max(table_3$date_2, table_3$date_3) && table_3$date_1 >= min(table_3$date_2,table_3$date_3)]
table_3[!duplicated(table_3),]
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/374314.html
上一篇:R:基于“OR”陳述句的連接
下一篇:R:“模糊匹配”和“之間”陳述句
