是否有更快的方法來執行以下操作,在實際應用程式中,df有很多行(因此list_of_colnames具有相同數量的元素):
list_of_colnames <- list(c("A", "B"), c("A"))
some_vector <- c("fish", "cat")
map2(split(df, seq(nrow(df))), list_of_colnames, function(row, colnames) {
row$indicator <- ifelse(any(row[, colnames] %in% some_vector), 1, 0)
return(row)
})
雖然當前的實作有效,但大型df. 其實我覺得split()是一個主要的瓶頸。
謝謝!
uj5u.com熱心網友回復:
一種選擇可能是利用row/column索引
rowind <- rep(seq_len(nrow(df)), lengths(list_of_colnames) * nrow(df))
df$indicator <- (tapply(c(t(df[unlist(list_of_colnames)])) %in% some_vector,
rowind, FUN = any))
-輸出
> df
A B indicator
1 fish A 1
2 hello cat 1
資料
df <- data.frame(A = c('fish', 'hello'), B = c('A', 'cat'))
uj5u.com熱心網友回復:
您可以避免將您的資料框拆分成一個串列,而是使用rowwise和c_acrossfrom在行中應用您的條件dplyr:
library(dplyr)
library(purrr)
list_of_colnames <- list(c("A", "B"), c("A"))
some_vector <- c("fish", "cat")
map(list_of_colnames, ~
df %>%
rowwise() %>%
mutate(indicator = as.numeric(any(c_across(all_of(.x)) %in% some_vector))) %>%
ungroup()
)
輸出
仍然映射list_of_columns回傳一個串列輸出:
[[1]]
# A tibble: 3 x 4
A B C indicator
<chr> <chr> <chr> <lgl>
1 fish dog bird TRUE
2 dog cat bird TRUE
3 bird lion cat FALSE
[[2]]
# A tibble: 3 x 4
A B C indicator
<chr> <chr> <chr> <lgl>
1 fish dog bird TRUE
2 dog cat bird FALSE
3 bird lion cat FALSE
資料
structure(list(A = c("fish", "dog", "bird"), B = c("dog", "cat",
"lion"), C = c("bird", "bird", "cat")), class = "data.frame", row.names = c(NA,
-3L))
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/344013.html
