在R中命名群組 -有解無憂

我在R中使用鳶尾花資料集。我使用K-means對資料進行聚類；輸出是變數km.out。然而，我找不到一個簡單的方法將聚類編號（1-3）分配給一個物種（versicolor, setosa, virginica）。我創建了一個手工方法，但我必須設定種子，而且是非常手工的。一定有一個更好的方法來做這件事。有什么想法嗎？

for (i  in 1。 length(km. out$cluster)） {
  if (km. out$cluster[i] == 1） {
    km.out$cluster[i] = "versicolor"
  }
}
for (i in 1。 length(km. out$cluster)） {
  if (km. out$cluster[i] == 2） {
    km.out$cluster[i] = "setosa"
  }
}
for (i in 1。 length(km. out$cluster)） {
  if (km. out$cluster[i] == 3） {
    km.out$cluster[i] = "virginica"
  }
}

uj5u.com熱心網友回復：

R是一種矢量語言，下面的單行代碼等同于問題中的代碼。

km. out$cluster < - c("versicolor"/span>。  "setosa", "virginica"）[/span>km. out$cluster]

uj5u.com熱心網友回復：

不清楚你想達到什么目的。由kmeans創建的集群不會與Species完全匹配，而且不能保證集群1、2、3會與iris中的物種順序相匹配。另外正如你所注意到的，結果會因種子的數值不同而不同。例如，

set.seed（42）
虹膜。 km < - kmeans(scale(iris[。  -5]）。  3）
table(iris.km$cluster,/span>iris$Species)
# 
# setosa versicolor virginica
# 1 50 0 0
# 2 0 39 14 # 2 0 39 14
# 3 0 11 36

群組1與setosa完全相關，但群組2與群組3一樣，結合了versicolor和virginica。

uj5u.com熱心網友回復：

你可以重新編碼群集的編號，并將其添加到原始資料中：

library(dplyr) 
mutate(iris,) 
       集群= case_when(km. out$cluster == 1 ~ "versicolor",/span>
                           km.out$cluster == 2 ~ "setosa"/span>,
                           公里。 out$cluster == 3 ~ ~ "virginica"））

另外，你可以使用矢量翻譯的方法，用elucidate::translate()重新編碼一個矢量

。

remotes:: install_github("bcgov/elucidate") #if elucidate is not installed yet
library（dplyr）
library(elucidate)

mutate(iris,) 
       cluster = translate(km.out$cluster,) 
                           old = c(1。 3）, 
                           新= c("versicolor",) 
                                    "setosa", 
                                    "virginica"）））

uj5u.com熱心網友回復：

如果你想將聚類編號（1-3）分配給一個物種（versicolor, setosa, virginica），你可能不會有1:1的對應關系。但是，你可以在每個群組中分配最頻繁的物種，就像這樣：

data(iris)

# k-means聚類。
set.seed(5834) 
公里。 out <- kmeans(iris[,1。 4], centers = 3）

#將物種與集群相關聯。
(cmat < -表(物種=鳶尾[。 5],集群=公里。 out$cluster))
#> cluster
#> Species 1 2 3
#> setosa 33 17 0
#> versicolor 0 4 46
#> virginica 0 0 50

# find the most-frequent species in each cluster[/span].
setNames（rownames（cmat）[應用(cmat。  2,其中。 max）]。  colnames(cmat)）
#> 1 2 3 
#> "setosa" "setosa" "virginica"。

#查找每個物種最常分配的集群 #找到最常分配的集群
setNames（colnames（cmat)[應用(cmat。  1,其中。 max）]。  rownames(cmat)）
#> setosa versicolor virginica 
#> "1" "3" "3"。

^{創建于2021-09-22，由reprex包（v2.0.1）}

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/332213.html

標籤：

上一篇：在Oracle的SQL中通過不同的值進行分組和排序

下一篇：在R中使用資料集位置進行過濾