我有以下資料集:
data.frame(id=c(1,1,1,1,1,2,2,2,2),
date = as.Date(c("2020-01-01","2020-01-04","2020-01-06","2020-01-07","2020-01-10","2020-01-01","2020-01-02","2020-01-04","2020-01-05")),
duration = c(2,3,4,2,4,3,4,2,2),
product = c("A","B","C","A","C","B","C","A","A"))
我有一個人的 id,他們每天使用什么產品以及產品將持續多久(持續時間) - 更新:這個樣本中的產品確實有一個設定的持續時間,但實際上它不需要是案件。
我需要為每一行列出每個人當前使用的產品串列,因此生成的資料集應如下所示(此處的分隔符為“|”,但無關緊要):
data.frame(id=c(1,1,1,1,1,2,2,2,2),
date = as.Date(c("2020-01-01","2020-01-04","2020-01-06","2020-01-07","2020-01-10","2020-01-01","2020-01-02","2020-01-04","2020-01-05")),
duration = c(2,3,4,2,4,3,4,2,2),
product = c("A","B","C","A","C","B","C","A","A"),
products_in_use = c("A","B","B | C", "A | B | C", "C", "B", "B | C", "A | B | C", "A | C"))
基本上我想我需要從當前行中獲取持續時間(如更少或相等的天數)內的所有行,并將當前產品附加到他們的串列中。然后我會采用串列的唯一且有序的版本,并將其作為字串寫入。但我不知道如何做第一步。
如果所有這些都可以在 dplyr 管道內作業,那將是首選。
uj5u.com熱心網友回復:
我看不到一個容易完全在這樣的方式dplyr,因為它依賴于檢查日期和時間的總和每個在日行每行,但如果你先定義此功能:
get_products_in_use <- function(dates, durations, products)
{
apply(sapply(seq_along(dates),
function(i) {
ifelse(test = dates >= dates[i] & dates <= dates[i] durations[i],
yes = products[i],
no = "")
}),
1, function(x) paste(unique(sort(x[nzchar(x)])), collapse = " | "))
}
然后它很容易在dplyr管道中使用:
testdata %>%
group_by(id) %>%
mutate(products_in_use = get_products_in_use(date, duration, product))
#> # A tibble: 9 x 5
#> # Groups: id [2]
#> id date duration product products_in_use
#> <dbl> <date> <dbl> <chr> <chr>
#> 1 1 2020-01-01 2 A A
#> 2 1 2020-01-04 3 B B
#> 3 1 2020-01-06 4 C B | C
#> 4 1 2020-01-07 2 A A | B | C
#> 5 1 2020-01-10 4 C C
#> 6 2 2020-01-01 3 B B
#> 7 2 2020-01-02 4 C B | C
#> 8 2 2020-01-04 2 A A | B | C
#> 9 2 2020-01-05 2 A A | C
由reprex 包( v2.0.0 )于 2021 年 11 月 9 日創建
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/354152.html
標籤:r
上一篇:如何修復使用flextable構建表格時出現的tidy錯誤
下一篇:在R中創建一個三向列聯表
