更快地填充矩陣-有解無憂

我有一個如下所示的資料框：

  Examples Type
1 example1    a
2 example1    b
3 example1    c
4 example1    c
5 example2    c

在一個矩陣中，行和列對應于每個示例，我想計算示例之間型別的交集。

my_mat <- matrix(0, nrow=length(unique(df$Examples)), ncol=length(unique(df$Examples))) 
rownames(my_mat) <- unique(df$Examples)
colnames(my_mat) <- unique(df$Examples)

我目前擁有的代碼是一個雙 for 回圈，在更大的范圍內它的速度要慢得多。

get_intersection <- function(example1, example2) {
  return(length(dplyr::intersect(example1, example2)))
}

for (i in 1:nrow(my_mat)) {
  curr_row <- rownames(my_mat)[i]
  for (j in 1:ncol(my_mat)) {
    curr_col <- colnames(my_mat)[j]
    my_mat[i, j] <- get_intersection(df$Type[which(df$Examples %in% curr_row)], 
                                     df$Type[which(df$Examples %in% curr_col)])
  }
}

如何使用“應用”方法來加速此矩陣的填充？

資料

df <- structure(list(Examples = c("example1", "example1", "example1", 
"example1", "example2"), Type = c("a", "b", "c", "c", "c")), class = "data.frame", row.names = c(NA, 
-5L))

uj5u.com熱心網友回復：

不確定您需要什么矩陣，您可以使用它在列的值上outer連續迭代一個函式。funique"Examples"

f <- \(x, y) length(intersect(df[df$Examples == x, 'Type'], df[df$Examples == y, 'Type']))
u <- unique(df$Examples)
outer(u, u, Vectorize(f)) |> `dimnames<-`(list(u, u))
#          example1 example2
# example1        3        1
# example2        1        1

資料：

df <- structure(list(Examples = c("example1", "example1", "example1", 
"example1", "example2"), Type = c("a", "b", "c", "c", "c")), class = "data.frame", row.names = c(NA, 
-5L))

uj5u.com熱心網友回復：

如果我們旋轉資料，我們可以使用矩陣乘法：

library(dplyr)  
library(tidyr)
dfw = df %>%
  unique %>% 
  mutate(n = 1) %>%
  pivot_wider(names_from = Type, values_from = n, values_fill = 0) %>%
  as.data.frame

row.names(dfw) = dfw$Examples
dfm = as.matrix(dfw[-1])
result = dfm %*% t(dfm)
result
#          example1 example2
# example1        3        1
# example2        1        1

uj5u.com熱心網友回復：

我還沒有對它進行基準測驗，但是這個版本應該會快一點：

df <- data.frame(Examples = c('example1', 'example1', 'example1', 'example1', 'example2'), 
                 Type = c('a', 'b', 'c', 'c', 'c'), 
                 stringsAsFactors = FALSE)
examples <- unique(df$Examples)
my_mat <- matrix(0, nrow = length(examples), ncol = length(examples)) 
rownames(my_mat) <- examples 
colnames(my_mat) <- examples
perms <- gtools::permutations(v = examples, 
                              n = length(examples), 
                              r = 2, 
                              repeats.allowed = TRUE)
apply(perms, 1, function(x) {
  result <- intersect(df[ df$Examples == x[ 1 ], 'Type' ], 
                      df[ df$Examples == x[ 2 ], 'Type' ]) |>
    length()
  my_mat[ x[ 1 ], x[ 2 ] ] <<- result
}) |> invisible()
print(df)
print(my_mat)

uj5u.com熱心網友回復：

我們可以使用tcrossprod table

> tcrossprod(table(unique(df)))
          Examples
Examples   example1 example2
  example1        3        1
  example2        1        1

或者

> tcrossprod(table(df) > 0)
          Examples
Examples   example1 example2
  example1        3        1
  example2        1        1

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/526999.html

標籤：r数据框矩阵

上一篇：使用group_by()根據條件折疊R中的資料集

下一篇：如何向table()函式獲得的結果添加行和列