我正在使用 R 編程語言。假設我有以下兩個表:
table_1 = data.frame(id = c("123", "123", "125", "125"), id2 = c("11", "12", "14", "13"),
date_1 = c("2010-01-31","2010-01-31", "2015-01-31", "2018-01-31" ))
table_1$id = as.factor(table_1$id)
table_1$id2 = as.factor(table_1$id2)
table_1$date_1 = as.factor(table_1$date_1)
table_2 = data.frame(id = c("123", "123", "125", "125"), id2 = c("111", "112", "14", "113"),
date_2 = c("2009-01-31","2010-01-31", "2010-01-31", "2010-01-31" ),
date_3 = c("2011-01-31","2010-01-31", "2020-01-31", "2020-01-31" ))
table_2$id = as.factor(table_2$id)
table_2$id2 = as.factor(table_2$id2)
table_2$date_2 = as.factor(table_2$date_2)
table_2$date_3 = as.factor(table_2$date_3)
如果滿足以下兩個條件中的任何一個(即,如果 Condition_1 = TRUE OR Condition_2 = TRUE,則“加入” )
條件_1
- 如果 table_1$id = table_2$id
和
- 如果 table_1$date BETWEEN(table_2$date_2,table_2$date_3)
條件_2
- 如果 table_1$id2 = table_2$id2
和
- 如果 table_1$date BETWEEN(table_2$date_2,table_2$date_3)
我已經嘗試過的:我知道如何分別執行這兩個連接,例如:
library(sqldf)
#Condition_1
final = sqldf("select a.*, b.*
from table_1 a left join table_2 b
on a.id = b.id and
a.date_1 between
b.date_2 and
b.date_3")
#Condition_2
final_2 = sqldf("select a.*, b.*
from table_1 a left join table_2 b
on a.id2 = b.id2 and
a.date_1 between
b.date_2 and
b.date_3")
然后我可以將這些檔案(“final_3”)“系結”在一起(并洗掉完全重復的行):
final_3 = rbind(final, final_2)
final_3 = final_3[!duplicated(final_3[c(1,2,3,4,5,6,7)]),]
我的問題:有沒有辦法在一個步驟中將這兩個表合并在一起,而不是兩個單獨的步驟?這可以使用 Base R 或 DPLYR 來完成嗎?
謝謝!
uj5u.com熱心網友回復:
這可以在單個 SQL 陳述句中完成,如下所示。
library(sqldf)
sqldf("select distinct *
from table_1 a left join table_2 b
on (a.date_1 between b.date_2 and b.date_3) and
(a.id = b.id or a.id2 = b.id2)")
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/374313.html
