早上堆疊溢位,
我正在創建一個函式,它通過ColumnOfDatasetToSearch矩陣的單列Dataset搜索多個搜索詞SearchFeatures。它適用于具有 10^4 行的矩陣,但當行數超過 10^6 或SearchFeatures超過 100 個術語時,它的速度確實會變慢。我認為矢量化ColumnOfDatasetToSearch會提高我的速度,但性能提升不大。
ListSearcher <- function(SearchFeatures, Dataset, ColumnOfDatasetToSearch){
RowNumber <- NA
ColumnOfInterest <- pull(Dataset, ColumnOfDatasetToSearch)
LengthOfSearchTerms <- length(SearchFeatures)
for (j in 1:LengthOfSearchTerms){
if(length(i <- grep(SearchFeatures[j], ColumnOfInterest)))
RowNumber <- append(RowNumber, i)
}
IdentifiersWithThoseSerchTerms <- unique(na.omit((Dataset$Identifiers[RowNumber])))
return(IdentifiersWithThoseSerchTerms)
}
提前感謝您的建議。
NewToCoding
uj5u.com熱心網友回復:
假設您正在使用資料集iris并希望回傳列Petal.Length而不是Identifiers.
這行得通嗎?它應該快得多
ListSearcher <- function(SearchFeatures, Dataset, ColumnOfDatasetToSearch){
searchstring <- paste0(SearchFeatures, collapse = "|")
selection <- grepl(searchstring, Dataset[[ColumnOfDatasetToSearch]])
Dataset[selection, ]$Petal.Length
}
# try with a subset of iris
iris[c(1,2,51,52), ]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 51 7.0 3.2 4.7 1.4 versicolor
#> 52 6.4 3.2 4.5 1.5 versicolor
ListSearcher(c("ver", "se") , iris[c(1,2,51,52), ], "Species")
#> [1] 1.4 1.4 4.7 4.5
編輯:由于只需要回傳一列,這似乎要快一些。注意函式的最后一行:
ListSearcher2 <- function(SearchFeatures, Dataset, ColumnToSearch){
searchstring <- paste0(SearchFeatures, collapse = "|")
selection <- grepl(searchstring, Dataset[[ColumnToSearch]])
Dataset$Petal.Length[selection]
}
微基準比較:
library(microbenchmark)
microbenchmark(
search1 = ListSearcher(c("ver", "se") , iris[c(1,2,51,52), ], "Species"),
search2 = ListSearcher2(c("ver", "se") , iris[c(1,2,51,52), ], "Species"),
times = 100000
)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> search1 154.5 161.5 201.3834 169.5 189.8 43474.8 1e 05
#> search2 96.8 101.9 127.6075 106.8 118.7 41666.2 1e 05
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/514366.html
標籤:r表现
上一篇:查詢改進-gremlin#2
