所以讓我提供一些例子來解釋我的問題。我有兩個資料 df1 和 df2。我想離開加入兩個資料集。通過滿足兩個條件。
(1) 周相同
(2) df1 中的 m1 和 m2 與 df2 中的 m1 和 m2 相同,但忽略列名
所以期望輸出是 df3
df1<-data.frame("m1"=c("100010","100010","100010","100020","100020","100020"),"m2"=c("100020","100020","100020","100010","100010","100010"),"week"=c(1,2,3,1,1,3))
df2<-data.frame("m1"=c("100010","100010","100010"),"m2"=c("100020","100020","100020"),"week"=c(1,2,3),"freq"=c(3,1,2))
print(df1)
m1 m2 week
1 100010 100020 1
2 100010 100020 2
3 100010 100020 3
4 100020 100010 1
5 100020 100010 1
6 100020 100010 3
print(df2)
m1 m2 week freq
1 100010 100020 1 3
2 100010 100020 2 1
3 100010 100020 3 2
df3<- data.frame("m1"=c("100010","100010","100010","100020","100020","100020"),"m2"=c("100020","100020","100020","100010","100010","100010"),"week"=c(1,2,3,1,1,3),"freq"=c(3,1,2,3,3,2))
print(df3)
m1 m2 week freq
1 100010 100020 1 3
2 100010 100020 2 1
3 100010 100020 3 2
4 100020 100010 1 3
5 100020 100010 1 3
6 100020 100010 3 2
我嘗試單獨合并,但它為 freq 創建了不需要的重復列。這還有什么我可以嘗試的嗎?非常感謝!
uj5u.com熱心網友回復:
我想我會根據您的喜好提出兩種方法。第一個是使用 SQL 而不是 R 來完成任務。對于您所描述的聯接型別,它更簡單一些。
library(sqldf)
library(dplyr)
df1<-data.frame("m1"=c("100010","100010","100010","100020","100020","100020"),"m2"=c("100020","100020","100020","100010","100010","100010"),"week"=c(1,2,3,1,1,3))
df2<-data.frame("m1"=c("100010","100010","100010"),"m2"=c("100020","100020","100020"),"week"=c(1,2,3),"freq"=c(3,1,2))
df3<- data.frame("m1"=c("100010","100010","100010","100020","100020","100020"),"m2"=c("100020","100020","100020","100010","100010","100010"),"week"=c(1,2,3,1,1,3),"freq"=c(3,1,2,3,3,2))
df_sql <-
sqldf::sqldf("SELECT a.*, b.freq
FROM df1 a
LEFT JOIN df2 b
ON (a.week = b.week and a.m1 = b.m1 and a.m2 = b.m2) OR
(a.week = b.week and a.m1 = b.m2 and a.m2 = b.m1)")
identical(df_sql, df3)
#> [1] TRUE
我相信有更優雅的方法可以做到這一點,但第二種策略只是復制df2,重命名列m1并m2反轉,然后進行連接。
df <-
df2 %>%
rename(m2 = m1, m1 = m2) %>%
bind_rows(df2, .) %>%
left_join(df1, ., by = c("week", "m1", "m2"))
identical(df, df3)
#> [1] TRUE
我想還有其他不涉及連接的方法,但這就是我使用連接的方式。
由reprex 包于 2022-02-17 創建(v2.0.1)
uj5u.com熱心網友回復:
如果我們想要OR加入,我們可以使用regex_left_joinfromfuzzyjoin
library(dplyr)
library(fuzzyjoin)
library(stringr)
regex_left_join(df1 %>%
mutate(m1m2 = str_c(m1, m2, sep = "|")),
df2 %>%
mutate(m1m2 = str_c(m1, m2, sep = "|"), .keep = "unused"),
by = c("m1m2", "week")) %>%
select(m1, m2, week = week.x, freq)
-輸出
m1 m2 week freq
1 100010 100020 1 3
2 100010 100020 2 1
3 100010 100020 3 2
4 100020 100010 1 3
5 100020 100010 1 3
6 100020 100010 3 2
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/432592.html
上一篇:如何組合這些表以獲得正確的輸出?
