r中多列的rowsum-有解無憂

我可以target通過中的分類列中的級別來獲取列的總和catVariables。但是，我不想在 for 回圈中執行此操作，而是想一次將其應用于所有分類列。For 回圈將使代碼運行更長時間，并且以矢量化方式執行此操作會更快。

# Data
col1 <- c("L", "R", "R", "L", "R", "L", "R", "L")
col2 <- c("R", "R", "R", "L", "L", "R", "L", "R")
col3 <- c("L", "-", "L", "R", "-", "L", "R", "-")
target <- c(1, 0, 0, 1, 1, 0, 1, 0)



dat <- data.frame("col1" = col1, "col2" = col2, "col3" = col3, "target" = target)

dat[sapply(dat, is.character)] <- lapply(dat[sapply(dat, is.character)], as.factor)
catVariables <- names(Filter(is.factor, dat))



# test
col1 <- c("L", "R", "R", "L", "R", "L", "R", "L")
col2 <- c("R", "R", "R", "L", "L", "R", "L", "R")
col3 <- c("L", "-", "L", "R", "-", "L", "R", "-")
target <- c(1, 0, 0, 1, 1, 0, 1, 0)

test_dat <- data.frame("col1" = col1, "col2" = col2, "col3" = col3, "target" = target)



for (col in catVariables){
ratios <- rowsum(dat[["target"]], dat[[col]])/sum(dat[["target"]])
print(ratios)
dat[[col]] <- ratios[match(dat[[col]],names(ratios[,1]))]
test_dat[[col]] <- ratios[match(test_dat[[col]], names(ratios[,1]))]
}

uj5u.com熱心網友回復：

我們可以在多列上使用acrossindplyrrowsum

library(dplyr)
dat %>% 
  mutate(across(all_of(catVariables), 
     ~ {tmp <- rowsum(target, .x)/sum(target);
  tmp[match(.x, row.names(tmp))]}))

-輸出

   col1 col2 col3 target
1  0.5 0.25 0.25      1
2  0.5 0.25 0.25      0
3  0.5 0.25 0.25      0
4  0.5 0.75 0.50      1
5  0.5 0.75 0.25      1
6  0.5 0.25 0.25      0
7  0.5 0.75 0.50      1
8  0.5 0.25 0.25      0

或者使用test_dat/train data ('dat')，一個選項是回圈test_dat，使用列名 ( cur_column()) 從 'dat' 中提取相應的列來計算rowsum分組，然后使用行名來計算match'test_dat' 列值輸出以擴展資料

test_dat %>% 
  mutate(across(all_of(catVariables), 
     ~ {tmp <- rowsum(dat[["target"]], dat[[cur_column()]])/sum(dat[["target"]]);
  tmp[match(.x, row.names(tmp))]}))
  col1 col2 col3 target
1  0.5 0.25 0.25      1
2  0.5 0.25 0.25      0
3  0.5 0.25 0.25      0
4  0.5 0.75 0.50      1
5  0.5 0.75 0.25      1
6  0.5 0.25 0.25      0
7  0.5 0.75 0.50      1
8  0.5 0.25 0.25      0

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/437663.html

標籤：r 数据框 for循环行和

上一篇：我怎樣才能把這個函式變成Lambda？

下一篇：在DjangoViewsFor回圈中生成多個PDF報告