我有以下資料集,其中列 clust 是初始集群,lt_clust 是一段時間后的結果集群:
dataset <- data.frame(Id = c(101, 102, 103, 104, 105, 106, 107, 108,
109, 110, 111, 112, 113, 114),
clust = c("k1", "k1", "k1", "k1","k1", "k2", "k2",
"k2", "k2", "k2", "k3", "k3", "k3", "k3"),
lt_clust = c("k2", "k1", "k1", "k1", "k1", "k2", "k3",
"k1", "k2", "k2", "k3", "k3", "k1", "k3"),
stringsAsFactors = FALSE)
現在我想測驗分配最終集群時我的準確程度,因此預期結果是:
clust lt_clust rate
<fct> <fct> <dbl>
1 k1 k1 0.8
2 k1 k2 0.2
3 k1 k3 0
4 k2 k1 0.2
5 k2 k2 0.6
6 k2 k3 0.2
7 k3 k1 0.25
8 k3 k2 0
9 k3 k3 0.75
這是我的第一次嘗試:
dataset %>%
mutate(clust = as.factor(clust),
lt_clust = as.factor(lt_clust),
tick = 1) %>%
group_by(clust, lt_clust, .drop = FALSE) %>%
summarise(total = sum(tick)) %>%
ungroup() %>%
group_by(clust, ) %>%
summarise(rate = total / sum(total))
但我未能捕獲 lt_clust 列:
clust rate
<fct> <dbl>
1 k1 0.8
2 k1 0.2
3 k1 0
4 k2 0.2
5 k2 0.6
6 k2 0.2
7 k3 0.25
8 k3 0
9 k3 0.75
當我嘗試這個時,結果也是錯誤的:
dataset %>%
mutate(clust = as.factor(clust),
lt_clust = as.factor(lt_clust),
tick = 1) %>%
group_by(clust, lt_clust, .drop = FALSE) %>%
summarise(total = sum(tick),
rate = total / sum(total))
clust lt_clust total rate
<fct> <fct> <dbl> <dbl>
1 k1 k1 4 1
2 k1 k2 1 1
3 k1 k3 0 NaN
4 k2 k1 1 1
5 k2 k2 3 1
6 k2 k3 1 1
7 k3 k1 1 1
8 k3 k2 0 NaN
9 k3 k3 3 1
拜托,你能幫我找出我在代碼中做錯了什么嗎?我嘗試使用 dplyr 包來做到這一點。
uj5u.com熱心網友回復:
從您第一次嘗試開始,只需lt_clust單獨添加到summarise():
dataset %>%
mutate(clust = as.factor(clust),
lt_clust = as.factor(lt_clust),
tick = 1) %>%
group_by(clust, lt_clust, .drop = FALSE) %>%
summarise(total = sum(tick)) %>%
ungroup() %>%
group_by(clust, ) %>%
summarise(lt_clust, rate = total / sum(total))
# A tibble: 9 × 3
# Groups: clust [3]
clust lt_clust rate
<fct> <fct> <dbl>
1 k1 k1 0.8
2 k1 k2 0.2
3 k1 k3 0
4 k2 k1 0.2
5 k2 k2 0.6
6 k2 k3 0.2
7 k3 k1 0.25
8 k3 k2 0
9 k3 k3 0.75
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/375170.html
上一篇:在R中創建空資料表
