假設我有一個串列串列,其中每個子串列都是一個移動:
movies <- list(list("Jurassic Park", "Steven Spielberg", "Action"),
list("Avatar", "James Cameron", "Action"),
list("Schindler's List", "Steven Spielberg", "Biography")
)
根據子串列元素對該串列進行子集化的最佳/最快方法是什么(最好沒有依賴關系,但 tidyverse 會很好)?也就是說,如果導演總是子串列中的第二個元素,那么獲取斯皮爾伯格導演的電影名稱向量的最快方法是什么?
希望在非常大的串列中多次執行此操作。
提前致謝!!
uj5u.com熱心網友回復:
sapply(movies, `[[`, 2)
# [1] "Steven Spielberg" "James Cameron" "Steven Spielberg"
基準:這個答案是最快的。
bench::mark(purrr = map_chr(movies, pluck, 2),
getElement = sapply(movies, getElement, 2),
`[[` = sapply(movies, `[[`, 2))
expression min median itr/s…1 mem_a…2 gc/se…3 n_itr n_gc
1 purrr 21.7μs 28.2μs 31773. 0B 6.36 9998 2
2 getElement 16.6μs 18.6μs 45652. 0B 4.57 9999 1
3 [[ 14.9μs 17.2μs 47417. 0B 4.74 9999 1
uj5u.com熱心網友回復:
無依賴且可讀:
sapply(movies, getElement, 2)
# [1] "Steven Spielberg" "James Cameron" "Steven Spielberg"
快速但不可讀,并假設每個子串列的長度為 3:
unlist(movies)[-1L:(length(movies) * 3L-2L) %% 3L == 0L]
玩 Rcpp(也許不是最優的)
Rcpp::cppFunction('
Rcpp::CharacterVector foo(Rcpp::List lst, int idx, int n) {
idx--;
Rcpp::CharacterVector res(n);
for (int i = 0; i < n; i ) {
Rcpp::List tmp = lst(i);
res(i) = Rcpp::as<String>(tmp(idx));
}
return res;
}
')
100k 子串列的基準測驗:
bench::mark(purrr = map_chr(movies, pluck, 2),
getElement = sapply(movies, getElement, 2),
`[[` = sapply(movies, `[[`, 2),
unlist = unlist(movies)[-1L:(length(movies) * 3L-2L) %% 3L == 0L],
Rcpp = foo(movies, 2L, length(movies))
)
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
# 1 purrr 234.6ms 236.78ms 4.15 781.3KB 13.8 3 10 723ms <chr [100,000]> <Rprofmem [42 × 3]> <bench_tm [3]> <tibble>
# 2 getElement 73.06ms 77.53ms 12.8 3.29MB 14.6 7 8 548ms <chr [100,000]> <Rprofmem [5 × 3]> <bench_tm [7]> <tibble>
# 3 [[ 28.09ms 30.43ms 27.1 3.29MB 10.2 16 6 590ms <chr [100,000]> <Rprofmem [5 × 3]> <bench_tm [16]> <tibble>
# 4 unlist 7.81ms 8.16ms 105. 8.01MB 17.8 53 9 506ms <chr [100,000]> <Rprofmem [7 × 3]> <bench_tm [53]> <tibble>
# 5 Rcpp 11.91ms 12.86ms 70.8 783.79KB 15.7 36 8 508ms <chr [100,000]> <Rprofmem [2 × 3]> <bench_tm [36]> <tibble>
電影下方評論中的一個小功能,包括過濾:
return_movies <- function(list, title_position, comparison_position, comparison_string) {
sapply(movies, getElement, title_position)[
sapply(movies, getElement, comparison_position) == comparison_string
]
}
return_movies(movies, 1, 2, "Steven Spielberg")
[1] "Jurassic Park" "Schindler's List"
uj5u.com熱心網友回復:
library(purrr)
map_chr(movies, pluck, 2)
#> [1] "Steven Spielberg" "James Cameron" "Steven Spielberg"
uj5u.com熱心網友回復:
一種向量化的基本 R 方法,它不假定串列內的向量長度:
bigmovies <- rep(movies, 1e4)
fsub1 <- function(x, i) unlist(x)[c(i, cumsum(lengths(x)[-length(x)]) i)]
fsub2 <- function(x, i) sapply(x, `[[`, i)
microbenchmark::microbenchmark(unlist = fsub1(bigmovies, 2),
sapply = fsub2(bigmovies, 2),
check = "identical")
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> unlist 1.8197 1.86370 2.113037 1.90760 2.15050 11.7461 100
#> sapply 7.5128 7.68515 8.309795 7.90885 9.01275 13.5639 100
uj5u.com熱心網友回復:
您可以嘗試以下基本 R 選項
> (d <- do.call(cbind, movies))[1, colSums(d == "Steven Spielberg") > 0]
[[1]]
[1] "Jurassic Park"
[[2]]
[1] "Schindler's List"
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/521657.html
