我正在按 3 乘 3 分組取平均值。為此,我正在使用該summarise功能。在這種情況下,我想從構成平均值的四個日期中選擇最后一個日期。
我試圖選擇最大值,但這樣我只是為整個組選擇了最高日期。
test = data.frame(my_groups = c("A", "A", "A", "B", "B", "C", "C", "C", "A", "A", "A"),
measure = c(10, 20, 5, 2, 62 ,2, 5, 4, 6, 7, 25),
time= c("20-09-2020", "25-09-2020", "19-09-2020", "20-05-2020", "20-06-2021",
"11-01-2021", "13-01-2021", "13-01-2021", "15-01-2021", "15-01-2021", "19-01-2021"))
# > test
# my_groups measure time
# 1 A 10 20-09-2020
# 2 A 20 25-09-2020
# 3 A 5 19-09-2020
# 4 B 2 20-05-2020
# 5 B 62 20-06-2021
# 6 C 2 11-01-2021
# 7 C 5 13-01-2021
# 8 C 4 13-01-2021
# 9 A 6 15-01-2021
# 10 A 7 15-01-2021
# 11 A 25 19-01-2021
test %>%
arrange(time) %>%
group_by(my_groups) %>%
summarise(mean_3 = rollapply(measure, 3, mean, by = 3, align = "left", partial = F),
final_data = max(time))
# my_groups mean_3 final_data
# <chr> <dbl> <chr>
# 1 A 12.7 25-09-2020
# 2 A 11.7 25-09-2020
# 3 C 3.67 13-01-2021
在第二行中,我希望日期是19-01-2021,而不是 group 的全域最大值A( 25-09-2020)。
關于我如何做到這一點的任何提示?
uj5u.com熱心網友回復:
我有 2 種 dplyr 方法供您使用。對此不滿意,因為當rollapplywithmax和dates 在 B 組中找不到任何內容時,它默認使用 double ,這與 A 和 C 組中的字符不匹配。
變異:
test %>%
arrange(time) %>%
group_by(my_groups) %>%
mutate(final = rollapply(time, 3, max, by = 3, fill = NA, align = "left", partial = F),
mean_3 = rollapply(measure, 3, mean, by = 3, fill = NA, align = "left", partial = F)) %>%
filter(!is.na(final)) %>%
select(my_groups, final, mean_3) %>%
arrange(my_groups)
# A tibble: 3 x 3
# Groups: my_groups [2]
my_groups final mean_3
<chr> <chr> <dbl>
1 A 19-01-2021 12.7
2 A 25-09-2020 11.7
3 C 13-01-2021 3.67
總結這不是總結,但在代碼中更清晰一點:
test %>%
arrange(time) %>%
group_by(my_groups) %>%
summarise(final = rollapply(time, 3, max, by = 3, fill = NA, align = "left", partial = F),
mean_3 = rollapply(measure, 3, mean, by = 3, fill = NA, align = "left", partial = F)) %>%
filter(!is.na(final))
`summarise()` has grouped output by 'my_groups'. You can override using the `.groups` argument.
# A tibble: 3 x 3
# Groups: my_groups [2]
my_groups final mean_3
<chr> <chr> <dbl>
1 A 19-01-2021 12.7
2 A 25-09-2020 11.7
3 C 13-01-2021 3.67
編輯:
從評論中添加了isa的解決方案。Partial = TRUE訣竅是:
test %>%
arrange(time) %>%
group_by(my_groups) %>%
summarise(mean_3 = rollapply(measure, 3, mean, by = 3, align = "left", partial = F),
final_data = rollapply(time, 3, max, by = 3, align = "left", partial = T))
`summarise()` has grouped output by 'my_groups'. You can override using the `.groups` argument.
# A tibble: 3 x 3
# Groups: my_groups [2]
my_groups mean_3 final_data
<chr> <dbl> <chr>
1 A 12.7 19-01-2021
2 A 11.7 25-09-2020
3 C 3.67 13-01-2021
uj5u.com熱心網友回復:
另一種可能的解決方案:
library(tidyverse)
test = data.frame(my_groups = c("A", "A", "A", "B", "B", "C", "C", "C", "A", "A", "A"),
measure = c(10, 20, 5, 2, 62 ,2, 5, 4, 6, 7, 25),
time= c("20-09-2020", "25-09-2020", "19-09-2020", "20-05-2020", "20-06-2021",
"11-01-2021", "13-01-2021", "13-01-2021", "15-01-2021", "15-01-2021", "19-01-2021"))
test %>%
group_by(data.table::rleid(my_groups)) %>%
filter(n() == 3) %>%
summarise(
groups = unique(my_groups),
mean_3 = mean(measure), final_data = max(time), .groups = "drop") %>%
select(-1)
#> # A tibble: 3 × 3
#> groups mean_3 final_data
#> <chr> <dbl> <chr>
#> 1 A 11.7 25-09-2020
#> 2 C 3.67 13-01-2021
#> 3 A 12.7 19-01-2021
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/369331.html
