我有一列以逗號分隔的模式包含多個類別。像那樣的東西
| ID | 類別 |
|---|---|
| 1 | A B C D |
| 2 | A、F、X、G |
| 3 | B、Y、X、D |
如何生成可能出現兩個類別的兩列,類似于
| ID | 類別 1 | 類別 2 |
|---|---|---|
| 1 | 一個 | 乙 |
| 1 | 一個 | C |
| 1 | 一個 | D |
| 1 | 乙 | C |
| 1 | 乙 | D |
| 1 | C | D |
| 2 | 一個 | F |
| 2 | 一個 | X |
| 2 | 一個 | G |
等等。
提前致謝!
uj5u.com熱心網友回復:
您可以拆分和使用combn,即
do.call(rbind, lapply(strsplit(df$categories, ', '), function(i)data.frame(t(combn(i, 2)))))
# X1 X2
#1 A B
#2 A C
#3 A D
#4 B C
#5 B D
#6 C D
#7 A F
#8 A X
#9 A G
#10 F X
#11 F G
#12 X G
#13 B Y
#14 B X
#15 B D
#16 Y X
#17 Y D
#18 X D
uj5u.com熱心網友回復:
另一個基礎R
do.call(
rbind,
lapply(
split(df,df$id),
function(x){
cbind(
x$id,
t(combn(strsplit(x$categories,", ")[[1]],2))
)
}
)
)
[,1] [,2] [,3]
[1,] "1" "A" "B"
[2,] "1" "A" "C"
[3,] "1" "A" "D"
[4,] "1" "B" "C"
[5,] "1" "B" "D"
[6,] "1" "C" "D"
[7,] "2" "A" "F"
[8,] "2" "A" "X"
[9,] "2" "A" "G"
[10,] "2" "F" "X"
[11,] "2" "F" "G"
[12,] "2" "X" "G"
[13,] "3" "B" "Y"
[14,] "3" "B" "X"
[15,] "3" "B" "D"
[16,] "3" "Y" "X"
[17,] "3" "Y" "D"
[18,] "3" "X" "D"
uj5u.com熱心網友回復:
使用 tidyverse 的解決方案:
- 使用
strsplit()(或stringr::str_split()從原始資料中獲取每個類別。 - 拆分資料
id,然后使用每個可能的組合為該 id 生成一個子資料幀。 - 將表重新連接在一起(此步驟可以方便地使用與步驟 2 相同的功能
purrr::map_df())。
library(tidyverse)
data %>%
mutate(all = str_split(categories, ", ")) %>%
split(.$id) %>%
map_df(function(df) {
combs = t(combn(unlist(df$all), m = 2))
tibble(id = df$id, cat_1 = combs[, 1], cat_2 = combs[, 2])
})
輸出
# A tibble: 18 x 3
id cat_1 cat_2
<dbl> <chr> <chr>
1 1 A B
2 1 A C
3 1 A D
4 1 B C
5 1 B D
6 1 C D
7 2 A F
8 2 A X
9 2 A G
10 2 F X
11 2 F G
12 2 X G
13 3 B Y
14 3 B X
15 3 B D
16 3 Y X
17 3 Y D
18 3 X D
uj5u.com熱心網友回復:
一個data.table選項
> setDT(df)[, data.table(t(combn(scan(text = categories, what = "character", sep = ","), 2))), id]
Read 4 items
Read 4 items
Read 4 items
id V1 V2
1: 1 A B
2: 1 A C
3: 1 A D
4: 1 B C
5: 1 B D
6: 1 C D
7: 2 A F
8: 2 A X
9: 2 A G
10: 2 F X
11: 2 F G
12: 2 X G
13: 3 B Y
14: 3 B X
15: 3 B D
16: 3 Y X
17: 3 Y D
18: 3 X D
或者,我們可以使用dplyr如下管道
df %>%
group_by(id) %>%
mutate(categories = list(data.frame(t(combn(unlist(strsplit(categories, ", ")), 2))))) %>%
unnest(categories) %>%
ungroup()
這使
id X1 X2
<int> <chr> <chr>
1 1 A B
2 1 A C
3 1 A D
4 1 B C
5 1 B D
6 1 C D
7 2 A F
8 2 A X
9 2 A G
10 2 F X
11 2 F G
12 2 X G
13 3 B Y
14 3 B X
15 3 B D
16 3 Y X
17 3 Y D
資料
> dput(df)
structure(list(id = 1:3, categories = c("A, B, C, D", "A, F, X, G",
"B, Y, X, D")), class = "data.frame", row.names = c(NA, -3L))
uj5u.com熱心網友回復:
使用gregexpr.
z <- dat$categories
t(do.call(cbind, lapply(regmatches(z, gregexpr(z, pa='\\w')), combn, 2)))
# [,1] [,2]
# [1,] "A" "B"
# [2,] "A" "C"
# [3,] "A" "D"
# [4,] "B" "C"
# [5,] "B" "D"
# [6,] "C" "D"
# [7,] "A" "F"
# [8,] "A" "X"
# [9,] "A" "G"
# [10,] "F" "X"
# [11,] "F" "G"
# [12,] "X" "G"
# [13,] "B" "Y"
# [14,] "B" "X"
# [15,] "B" "D"
# [16,] "Y" "X"
# [17,] "Y" "D"
# [18,] "X" "D"
通過身份證
do.call(rbind.data.frame, by(dat, dat$categories, \(x) {
z <- x$categories
cbind(id=x$id,
t(do.call(cbind, lapply(regmatches(z, gregexpr(z, pa='\\w')), combn, 2))))
}))
# id X1 X2
# A, B, C, D.1 1 A B
# A, B, C, D.2 1 A C
# A, B, C, D.3 1 A D
# A, B, C, D.4 1 B C
# A, B, C, D.5 1 B D
# A, B, C, D.6 1 C D
# A, F, X, G.1 2 A F
# A, F, X, G.2 2 A X
# A, F, X, G.3 2 A G
# A, F, X, G.4 2 F X
# A, F, X, G.5 2 F G
# A, F, X, G.6 2 X G
# B, Y, X, D.1 3 B Y
# B, Y, X, D.2 3 B X
# B, Y, X, D.3 3 B D
# B, Y, X, D.4 3 Y X
# B, Y, X, D.5 3 Y D
# B, Y, X, D.6 3 X D
筆記: "R version 4.1.2 (2021-11-01)"
資料:
dat <- structure(list(id = 1:3, categories = c("A, B, C, D", "A, F, X, G",
"B, Y, X, D")), class = "data.frame", row.names = c(NA, -3L))
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/390458.html
上一篇:如何在r中運行兩個索引
