在 R 資料表中,我想根據選定的列按行進行求和。例子 :
iris = data.table(iris[,-5])
cols = c("Petal.Length","Petal.Width")
我是這樣做的,但我不想使用 rowSums 函式:
iris[, newSum := rowSums(.SD), by = .I, .SDcols = c("Petal.Length","Petal.Width")]
有人有技巧可以輕松地將所選列的行相加嗎?
謝謝
uj5u.com熱心網友回復:
怎么了rowSums?這是這里最好的方法,順便說一句,使用基礎 R 可能會更好:
iris$newSum <- rowSums(iris[, c("Petal.Length", "Petal.Width")])
> iris
Sepal.Length Sepal.Width Petal.Length Petal.Width newSum
1: 5.1 3.5 1.4 0.2 1.6
2: 4.9 3.0 1.4 0.2 1.6
3: 4.7 3.2 1.3 0.2 1.5
4: 4.6 3.1 1.5 0.2 1.7
5: 5.0 3.6 1.4 0.2 1.6
---
146: 6.7 3.0 5.2 2.3 7.5
147: 6.3 2.5 5.0 1.9 6.9
148: 6.5 3.0 5.2 2.0 7.2
149: 6.2 3.4 5.4 2.3 7.7
150: 5.9 3.0 5.1 1.8 6.9
>
或者,如果您真的討厭和不喜歡rowSums:
iris$newSum <- apply(iris[, c("Petal.Length", "Petal.Width")], 1, sum)
uj5u.com熱心網友回復:
這些不使用 rowSums:
irisdt[, newSum := Reduce(` `, .SD), .SDcols = cols]
irisdt[, newSum := as.matrix(.SD) %*% rep(1, ncol(.SD)), .SDcols = cols]
irisdt[, newSum := eval(parse(text = paste(cols, collapse = " ")))]
irisdt[, newSum := apply(.SD, 1, sum), .SDcols = cols]
irisdt[, newSum := sum(.SD), by = 1:ncol(.SD), .SDcols = cols]
irisdt[, newSum := c(rep(1, ncol(.SD)) %*% t(.SD)), .SDcols = cols]
library(purrr)
irisdt[, newSum := pmap(.SD, sum), .SDcols = cols]
irisdt[, newSum := do.call("mapply", c(sum, .SD)), .SDcols = cols]
irisdt[, newSum := tapply(as.matrix(.SD), row(.SD), sum), .SDcols = cols]
筆記
library(data.table)
irisdt <- data.table(iris)
uj5u.com熱心網友回復:
這不是自己的答案,只是對迄今為止提供的答案的比較。
bench::mark(
nimliug = iris[, newSum := rowSums(.SD), by = .I, .SDcols = c("Petal.Length","Petal.Width")],
`nimliug mod` = iris[, newSum := rowSums(.SD), .SDcols = c("Petal.Length","Petal.Width")],
`U12-Forward 1` = { iris$newSum <- rowSums(iris[, c("Petal.Length", "Petal.Width")]); iris; },
`U12-Forward 2` = { iris$newSum <- apply(iris[, c("Petal.Length", "Petal.Width")], 1, sum); iris; },
`G.G 1` = iris[, newSum := Reduce(` `, .SD), .SDcols = cols],
`G.G 2` = iris[, newSum := as.matrix(.SD) %*% rep(1, ncol(.SD)), .SDcols = cols],
`G.G 3` = iris[, newSum := eval(parse(text = paste(cols, collapse=" ")))],
`G.G 4` = iris[, newSum := apply(.SD, 1, sum), .SDcols = cols],
`G.G 5` = iris[, newSum := sum(.SD), by = 1:nrow(iris), .SDcols = cols],
`G.G 6` = iris[, newSum := c(rep(1, ncol(.SD)) %*% t(.SD)), .SDcols = cols],
`G.G 7` = iris[, newSum := purrr::pmap_dbl(.SD, sum), .SDcols = cols],
`G.G 7 mod` = iris[, newSum := do.call(mapply, c(list(sum), .SD)), .SDcols = cols],
min_iterations = 1000
)
# # A tibble: 12 x 13
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
# 1 nimliug 425.4us 541.5us 1662. 52.4KB 0 1000 0 601.83ms <data.table[,5] [150 x 5]> <Rprofmem[,3] [9 x 3]> <bch:tm [1,000]> <tibble [1,000 x 3]>
# 2 nimliug mod 387.2us 481.3us 1964. 52.4KB 0 1000 0 509.12ms <data.table[,5] [150 x 5]> <Rprofmem[,3] [9 x 3]> <bch:tm [1,000]> <tibble [1,000 x 3]>
# 3 U12-Forward 1 169.8us 221.2us 4050. 45.7KB 3.25 1248 1 308.14ms <data.table[,5] [150 x 5]> <Rprofmem[,3] [14 x 3]> <bch:tm [1,249]> <tibble [1,249 x 3]>
# 4 U12-Forward 2 377us 503us 1837. 50.5KB 0 1000 0 544.43ms <data.table[,5] [150 x 5]> <Rprofmem[,3] [18 x 3]> <bch:tm [1,000]> <tibble [1,000 x 3]>
# 5 G.G 1 320.6us 508.5us 1889. 66.2KB 1.89 999 1 528.86ms <data.table[,5] [150 x 5]> <Rprofmem[,3] [10 x 3]> <bch:tm [1,000]> <tibble [1,000 x 3]>
# 6 G.G 2 360.1us 392.4us 2275. 52.4KB 0 1138 0 500.21ms <data.table[,5] [150 x 5]> <Rprofmem[,3] [9 x 3]> <bch:tm [1,138]> <tibble [1,138 x 3]>
# 7 G.G 3 373.7us 443.4us 2148. 34.3KB 0 1074 0 499.96ms <data.table[,5] [150 x 5]> <Rprofmem[,3] [8 x 3]> <bch:tm [1,074]> <tibble [1,074 x 3]>
# 8 G.G 4 540.3us 598.7us 1472. 57.3KB 1.47 999 1 678.56ms <data.table[,5] [150 x 5]> <Rprofmem[,3] [13 x 3]> <bch:tm [1,000]> <tibble [1,000 x 3]>
# 9 G.G 5 4.99ms 5.5ms 177. 51.2KB 1.43 992 8 5.61s <data.table[,5] [150 x 5]> <Rprofmem[,3] [11 x 3]> <bch:tm [1,000]> <tibble [1,000 x 3]>
# 10 G.G 6 377.5us 492.2us 1991. 56KB 0 1000 0 502.26ms <data.table[,5] [150 x 5]> <Rprofmem[,3] [11 x 3]> <bch:tm [1,000]> <tibble [1,000 x 3]>
# 11 G.G 7 707.7us 866.9us 1127. 66.2KB 1.13 999 1 886.81ms <data.table[,5] [150 x 5]> <Rprofmem[,3] [10 x 3]> <bch:tm [1,000]> <tibble [1,000 x 3]>
# 12 G.G 7 mod 460.1us 586.1us 1669. 54.5KB 1.67 999 1 598.62ms <data.table[,5] [150 x 5]> <Rprofmem[,3] [12 x 3]> <bch:tm [1,000]> <tibble [1,000 x 3]>
基準測驗肯定是邪惡的,尤其是當基準測驗使用的資料不能代表真實資料時(無論是在類別上還是在大小/維度上)。然而,從這似乎有點清楚,rowSums它本身顯然是最快的(高`itr/sec`)并且接近于最節省記憶體的(低mem_alloc)。
由于它們都得到相同的輸出(bench::mark默認為check=TRUE,這確保所有輸出都相同),我相信這是對優勢等的合理比較。從這里開始,哪個最有意義?代碼質量不僅與正確的輸出有關,還與可讀性和可維護性有關,尤其是當未來的自己可能無法回憶起為什么選擇一些晦澀的可讀性較差的代碼而不是更直接的宣告性代碼的所有背景關系時。
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/325701.html
下一篇:按組匯總跨多個函式的多列
