df <- data.frame(date = as.Date(c(rep("2022-01-01", 3),
rep("2022-02-01", 3),
rep("2022-03-01", 4))),
flavor = c("Almond", "Apple", "Apricot",
"Almond", "Maple", "Mint",
"Apricot", "Pecan", "Praline", "Pumpkin"))
#> date flavor
#> 1 2022-01-01 Almond
#> 2 2022-01-01 Apple
#> 3 2022-01-01 Apricot
#> 4 2022-02-01 Almond
#> 5 2022-02-01 Maple
#> 6 2022-02-01 Mint
#> 7 2022-03-01 Apricot
#> 8 2022-03-01 Pecan
#> 9 2022-03-01 Praline
#> 10 2022-03-01 Pumpkin
上面的 R 資料框逐月跟蹤冰淇淋店的冰淇淋口味。2 月份添加了 1 月份不存在的兩種口味(楓木、薄荷),以及 1 月份存在的兩種口味(蘋果、杏子)。3 月份添加了 2 月份沒有的四種口味(杏、山核桃、果仁糖、南瓜),并洗掉了 2 月份出現的三種口味(杏仁、楓木、薄荷)。
#> date flavors.added flavors.removed
#> 1 2022-01-01 <NA> <NA>
#> 2 2022-02-01 2 2
#> 3 2022-03-01 4 3
如何撰寫 R 腳本來計算上面的摘要資料框?也就是說,我想要滾動計數每月添加的上個月不存在的冰淇淋口味,以及每月移除的上個月存在的口味的計數。
uj5u.com熱心網友回復:
使用data.table:
library(data.table)
df2 = setDT(df)[, .(flavors = list(flavor)), date]
for (i in 2:nrow(df2))
set(
df2, i = i,
j = c('flavors_added', 'flavors_removed'),
list(length(setdiff(df2$flavors[[i]], df2$flavors[[i-1]])), length(setdiff(df2$flavors[[i-1]], df2$flavors[[i]])))
)
df2
# date flavors flavors_added flavors_removed
# <Date> <list> <int> <int>
# 1: 2022-01-01 Almond,Apple,Apricot NA NA
# 2: 2022-02-01 Almond,Maple,Mint 2 2
# 3: 2022-03-01 Apricot,Pecan,Praline,Pumpkin 4 3
uj5u.com熱心網友回復:
在dplyr:
library(dplyr)
df %>%
group_by(date) %>%
summarise(flavors = list(flavor)) %>%
mutate(flavors.added = lengths(mapply(setdiff, flavors, lag(flavors))),
flavors.removed = lengths(mapply(setdiff, lag(flavors), flavors)))
輸出
# A tibble: 3 × 4
date flavors flavors.added flavors.removed
<date> <list> <int> <int>
1 2022-01-01 <chr [3]> 3 0
2 2022-02-01 <chr [3]> 2 2
3 2022-03-01 <chr [4]> 4 3
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/475144.html
