我有資料框 A,其中包含時間事件和資料框 B,其中包含患者的事件范圍。如果時間事件不在資料框 B 的日期范圍之間,我只想包含資料框 A 中的行。如果資料框 B 中不存在來自資料框 A 的患者,則將事件添加到資料框 B 中。
由于資料框不相同,因此從資料框 A 添加到資料框 B 的行應添加日期 = 開始和日期 = 結束的行。
我試圖弄清楚如何讓它作業,dplyr但它似乎很復雜。我設法讓它與a一起作業,for-loop但對于我的教育,我想知道其他人如何完成同樣的任務
dfa <- data.frame(
date = c("2021-01-01", "2021-02-02", "2021-02-05"),
patient = c("one", "two", "three"))
dfb <- data.frame(
start = c("2020-12-31", "2021-02-01"),
end = c("2021-01-02", "2021-02-03"),
patient = c("one", "one"))
dfa$date <- as.Date(dfa$date, "%Y-%m-%d")
dfb$start <- as.Date(dfb$start, "%Y-%m-%d")
dfb$end <- as.Date(dfb$end, "%Y-%m-%d")
for (i in 1:nrow(dfa)) {
date <- dfa[i, "date"]
d_patient <- dfa[i, "patient"]
res <- dfb[d_patient == dfb$patient &
date >= dfb$start &
date <= dfb$end,]
if (nrow(res) == 0) {
tf <- data.frame("start" = date,
"end" = date,
"patient" = d_patient)
dfb <- rbind(dfb, tf)
}
}
print(dfb)
結果:
start end patient
1 2020-12-31 2021-01-02 one
2 2021-02-01 2021-02-03 one
3 2021-02-02 2021-02-02 two
4 2021-02-05 2021-02-05 three
uj5u.com熱心網友回復:
dfa <- data.frame(
date = c("2021-01-01", "2021-02-02", "2021-02-05"),
patient = c("one", "two", "three"))
dfb <- data.frame(
start = c("2020-12-31", "2021-02-01"),
end = c("2021-01-02", "2021-02-03"),
patient = c("one", "one"))
dfa$date <- as.Date(dfa$date, "%Y-%m-%d")
dfb$start <- as.Date(dfb$start, "%Y-%m-%d")
dfb$end <- as.Date(dfb$end, "%Y-%m-%d")
dfa
#> date patient
#> 1 2021-01-01 one
#> 2 2021-02-02 two
#> 3 2021-02-05 three
dfb
#> start end patient
#> 1 2020-12-31 2021-01-02 one
#> 2 2021-02-01 2021-02-03 one
library(tidyverse)
library(fuzzyjoin)
fuzzy_anti_join(
x = dfa,
y = dfb,
by = c("patient", "date" = "start", "date" = "end"),
match_fun = list(`==`, `>=`, `<=`)
) %>%
transmute(patient, start = date, end = date) %>%
bind_rows(dfb)
#> patient start end
#> 1 two 2021-02-02 2021-02-02
#> 2 three 2021-02-05 2021-02-05
#> 3 one 2020-12-31 2021-01-02
#> 4 one 2021-02-01 2021-02-03
由reprex 包于 2022-01-22 創建(v2.0.1)
資料表
library(magrittr)
library(data.table)
setDT(dfa)
setDT(dfb)
tmp <- dfa[!dfb, on = list(patient, date >= start, date <= end)] %>%
.[, `:=`(start = date, end = date, date = NULL)]
l <- list(tmp, dfb)
rbindlist(l = l, use.names = TRUE)
#> patient start end
#> 1: two 2021-02-02 2021-02-02
#> 2: three 2021-02-05 2021-02-05
#> 3: one 2020-12-31 2021-01-02
#> 4: one 2021-02-01 2021-02-03
由reprex 包于 2022-01-22 創建(v2.0.1)
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/419364.html
標籤:
下一篇:寬桌到長桌
