我確定我錯過了關于分組如何作業的一些東西。當我在匯總陳述句中(分組后)使用自己的函式時,每個組的結果都相同,這是錯誤的。我也沒有收到任何錯誤或警告,它只是默默地給我錯誤的答案。
我的目標是讓這個自定義函式與 group_by 一起玩得很好。
這是代碼:
library(dplyr)
#data
transect <- data.frame(acronym = c("ABEESC", "ABIBAL", "AMMBRE", "ANTELE", "ABEESC", "ABIBAL", "AMMBRE"),
quad_id = c(1, 1, 1, 1, 2, 2, 2))
#scores
c_scores <- data.frame(acronym = c("ABEESC", "ABIBAL", "AMMBRE", "ANTELE"),
c = c(5, 6, 6, 10))
#custom fun
my_fun <- function(data, scores){
join <- left_join(data, scores, by = "acronym")
mean <- mean(join$c)
return(mean)
}
#this works
my_fun(transect, c_scores)
#this also works
transect %>% my_fun(., c_scores)
#this doesn't...
transect %>%
group_by(quad_id) %>%
summarise(mean_c = my_fun(., scores = c_scores))
這是我的結果:
| quad_id | mean_c |
|---|---|
| 1 | 6.29 |
| 2 | 6.29 |
這就是我要的:
| quad_id | mean_c |
|---|---|
| 1 | 6.75 |
| 2 | 5.66 |
uj5u.com熱心網友回復:
我們可以使用cur_data()作為函式的輸入,而不是.使用.完整的資料集而不是組中的資料子集
library(dplyr)
transect %>%
group_by(quad_id) %>%
summarise(mean_c = my_fun(cur_data(), scores = c_scores))
-輸出
# A tibble: 2 × 2
quad_id mean_c
<dbl> <dbl>
1 1 6.75
2 2 5.67
如果我們想要一個message何時分組,那么使用is_grouped_df
my_fun2 <- function(data, scores)
{
if(dplyr::is_grouped_df(data))
{
message("data is grouped, so use cur_data() as data")
}
left_join(data, scores, by = "acronym") %>%
pull(c) %>%
mean
}
-測驗
> transect %>%
group_by(quad_id) %>%
summarise(mean_c = my_fun2(., scores = c_scores))
data is grouped, so use cur_data() as data
data is grouped, so use cur_data() as data
# A tibble: 2 × 2
quad_id mean_c
<dbl> <dbl>
1 1 6.29
2 2 6.29
> transect %>%
group_by(quad_id) %>%
summarise(mean_c = my_fun2(cur_data(), scores = c_scores))
# A tibble: 2 × 2
quad_id mean_c
<dbl> <dbl>
1 1 6.75
2 2 5.67
請注意,訊息會重復,因為該函式在分組后被多次應用(n 個組),當它位于內部時summarise。如果我們在外面做,訊息將被列印一次
> transect %>%
group_by(quad_id) %>%
my_fun2(., c_scores)
data is grouped, so use cur_data() as data
[1] 6.285714
如果我們想要一個函式,我們也可以這樣做
my_fun3 <- function(data, scores, grps = NULL)
{
data <- left_join(data, scores, by = "acronym")
if(!missing(grps))
{
data <- data %>%
group_by(across(all_of(grps)))
}
data %>%
summarise(mean_c = mean(c, na.rm = TRUE))
}
-測驗
> my_fun3(transect, c_scores, "quad_id")
# A tibble: 2 × 2
quad_id mean_c
<dbl> <dbl>
1 1 6.75
2 2 5.67
>
> my_fun3(transect, c_scores)
mean_c
1 6.285714
或通過使用in簡化無條件if使用missingany_ofgroup_by
my_fun3 <- function(data, scores, grps = NULL)
{
left_join(data, scores, by = "acronym") %>%
group_by(across(any_of(grps))) %>%
summarise(mean_c = mean(c, na.rm = TRUE))
}
-測驗
> my_fun3(transect, c_scores, "quad_id")
# A tibble: 2 × 2
quad_id mean_c
<dbl> <dbl>
1 1 6.75
2 2 5.67
> my_fun3(transect, c_scores)
# A tibble: 1 × 1
mean_c
<dbl>
1 6.29
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/521634.html
標籤:rdplyr
下一篇:平均每行的每個重復列?
