我需要對同一組的大約 40 個變數的值求和。
這是一個示例資料集。所以我想按地區和部門對 score1-score5 的值求和。
region <- rep(c("south", "east", "west", "north"),times=10)
department <- rep(c("A", "B","C","D","E"),times=8)
score1 <- rnorm(n = 40, mean = 0, sd = 1)
score2 <-rnorm(n = 40, mean = 3, sd = 1.5)
score3 <-rnorm(n = 40, mean = 2, sd = 1)
score4 <-rnorm(n = 40, mean = 1, sd = 1.5)
score5 <-rnorm(n = 40, mean = 5, sd = 1.5)
df <- data.frame(region, department, score1, score2, score3, score4, score5)
這是導致我想要的結果的代碼,但有沒有更簡單的方法來做到這一點:
df %>% group_by(region, department) %>%
summarise(score1=sum(score1),
score2=sum(score2),
score3=sum(score3),
score4=sum(score4),
score5=sum(score5))
我嘗試使用回圈,但這不起作用:
vlist<-c("score1", "score2", "score3", "score4", "score5")
for (var in vlist) {
df<-df %>% group_by(region, department) %>%
summarise(var=sum(.[[var]]))
}
有沒有其他方法或者我的回圈有什么問題?謝謝!
uj5u.com熱心網友回復:
使用across- 回圈“評分”across的列starts_with并獲得sum
library(dplyr)
out1 <- df %>%
group_by(region, department) %>%
summarise(across(starts_with('score'), sum), .groups = 'drop')
在for回圈中,問題是在每次迭代中df更新 ( df <-..) 并僅summarise回傳group by 中提供的列和匯總輸出。因此,在第一次迭代之后,'df' 根本不會有 'score' 列。如果我們想使用回圈,在 a 中獲取輸出,然后使用連接forlistreduce
library(purrr)
out_list <- vector('list', length(vlist))
names(out_list) <- vlist
for (var in vlist) {
out_list[[var]] <- df %>%
group_by(region, department) %>%
summarise(!!var := sum(cur_data()[[var]]), .groups = 'drop')
}
out2 <- reduce(out_list, full_join, by = c('region', 'department'))
- 檢查輸出
> identical(out1, out2)
[1] TRUE
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/381508.html
上一篇:無法控制回圈中的變數
