我有一個 data.frame 看起來有點像這樣:
df <- data.frame (names = LETTERS[1:10],
rep1 = sample(1:5, 10, replace=TRUE),
rep2 = sample(1:5, 10, replace=TRUE),
rep3 = sample(1:5, 10, replace=TRUE),
rep4= sample(1:5, 10, replace=TRUE))
print(df)
names rep1 rep2 rep3 rep4
1 A 2 2 5 4
2 B 5 5 5 1
3 C 3 4 2 5
4 D 5 3 5 3
5 E 2 3 2 4
6 F 5 5 2 4
7 G 1 3 1 3
8 H 2 2 3 3
9 I 1 1 4 3
10 J 3 1 3 5
我需要知道的是:不同代表中的某些名稱(“樣本”)是否(按數字)分組在一起?
但是,數字(1 到 5)是否不同并不重要,僅當特定名稱屬于同一組時才有意義(例如,A、E、H 在 rep1 中屬于組 2。它們是否在另一個 rep 中分組在一起?)。我想知道是否存在分組的“模式”,例如,某些名稱是否更頻繁地一起/在一組中出現?
有誰知道如何實作這一目標?
uj5u.com熱心網友回復:
也許這個可以幫助您找到一種模式:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(-names) %>%
group_by(name, value) %>%
summarise(grouping = paste(names, collapse = ", "),
.groups = "drop") %>%
pivot_wider(names_from = name,
values_from = grouping)
這回傳
# A tibble: 5 x 5
value rep1 rep2 rep3 rep4
<int> <chr> <chr> <chr> <chr>
1 1 D, E, J NA I A, C, E
2 2 A, B F, H A, C, D, F G
3 4 F, H D, E H D, H, I
4 5 C, G, I A, I, J B, J B, F
5 3 NA B, C, G E, G J
value代表的原始小組在哪里。
資料
structure(list(names = c("A", "B", "C", "D", "E", "F", "G", "H",
"I", "J"), rep1 = c(2L, 2L, 5L, 1L, 1L, 4L, 5L, 4L, 5L, 1L),
rep2 = c(5L, 3L, 3L, 4L, 4L, 2L, 3L, 2L, 5L, 5L), rep3 = c(2L,
5L, 2L, 2L, 3L, 2L, 3L, 4L, 1L, 5L), rep4 = c(1L, 5L, 1L,
4L, 1L, 5L, 2L, 4L, 4L, 3L)), class = "data.frame", row.names = c(NA,
-10L))
uj5u.com熱心網友回復:
這是一個回傳最大重疊 per 的解決方案rep*。
library(dplyr)
library(tidyr)
df %>%
pivot_longer(-names, names_to = "rep") %>%
group_by(rep, value) %>%
summarise(n = n(),
names = paste(names, collapse = ", ")) %>%
filter(n == max(n))
#`summarise()` has grouped output by 'name'. You can #override using the `.groups` argument.
## A tibble: 7 x 4
## Groups: name [4]
# rep value n names
# <chr> <int> <int> <chr>
#1 rep1 4 4 B, C, G, I
#2 rep2 3 3 A, D, I
#3 rep2 4 3 B, F, J
#4 rep3 2 3 D, G, H
#5 rep3 3 3 E, F, J
#6 rep3 5 3 A, B, I
#7 rep4 1 3 B, C, J
資料
從問題中重復測驗資料創建代碼,但使用偽 RNG 種子集,以使結果可重現。
set.seed(2021)
df <- data.frame (names = LETTERS[1:10],
rep1 = sample(1:5, 10, replace=TRUE),
rep2 = sample(1:5, 10, replace=TRUE),
rep3 = sample(1:5, 10, replace=TRUE),
rep4= sample(1:5, 10, replace=TRUE))
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/343673.html
