我有以下問題:
我想根據兩列之間的差異在資料框中創建一個新列,其中哪一行是字串向量:
我的代碼:
library(dplyr) # v.1.0.7
seqs <- c("seq1","seq2","seq3","seq4","seq5")
expect_mut <- c("S:T20N,S:D614G","S:T20N,S:D614G","S:T20N,N:G204R,N:G80R", "N:G204R, S:D614G", "N:G204R, S:D614G")
observed_mut <- c("S:T20N","S:D164G","S:T20N, N:G204R","S:D614G,N:G204R","S:D164G,S:T19I")
data_frame <- data.frame(seqs, expect_mut, observed_mut)
data_frame <- data_frame %>%
mutate(expect_mut = strsplit(as.character(expect_mut), ","),
observed_mut = strsplit(as.character(observed_mut), ",")) %>%
group_by(seqs) %>%
mutate(diff_mut = setdiff(observed_mut, expect_mut))
我的期望:
| seqs | expect_mut | observed_mut | diff_mut |
| ----- | ---------------------------------- | ----------------------- | ------------ |
| seq1 | c("S:T20N", "S:D614G") | S:T20N | |
| seq2 | c("S:T20N", "S:D614G") | S:D164G | S:D164G |
| seq3 | c("S:T20N", "N:G204R", "N:G80R") | c("S:T20N", " N:G204R") | |
| seq4 | c("N:G204R", "S:D614G") | c("N:G204R", "S:D614G") | |
| seq5 | c("N:G204R", "S:D614G") | c("S:D164G", "S:T19I") | c("S:D164G", "S:T19I") |
什么回傳:
| seqs | expect_mut | observed_mut | diff_mut |
| ----- | ---------------------------------- | ----------------------- | ------------ |
| seq1 | c("S:T20N", "S:D614G") | S:T20N | S:T20N |
| seq2 | c("S:T20N", "S:D614G") | S:D164G | S:D164G |
| seq3 | c("S:T20N", "N:G204R", "N:G80R") | c("S:T20N", " N:G204R") | c("S:T20N", " N:G204R") |
| seq4 | c("N:G204R", "S:D614G") | c("N:G204R", "S:D614G") | c("N:G204R", "S:D614G") |
| seq5 | c("N:G204R", "S:D614G") | c("S:D164G", "S:T19I") | c("S:D164G", "S:T19I") |
基本上是將 Observed_mut 的相同值回傳到 diff_mut 列中......
uj5u.com熱心網友回復:
由于兩列都list在 之后strsplit,用于map2回圈遍歷相應的list元素
library(dplyr)
library(purrr)
data_frame %>%
mutate(expect_mut = strsplit(as.character(expect_mut), ","),
observed_mut = strsplit(as.character(observed_mut), ",")) %>%
mutate(diff_mut = map2(observed_mut, expect_mut, setdiff)) %>%
as_tibble
-輸出
# A tibble: 5 × 4
seqs expect_mut observed_mut diff_mut
<chr> <list> <list> <list>
1 seq1 <chr [2]> <chr [1]> <chr [0]>
2 seq2 <chr [2]> <chr [1]> <chr [1]>
3 seq3 <chr [3]> <chr [2]> <chr [1]>
4 seq4 <chr [2]> <chr [2]> <chr [1]>
5 seq5 <chr [2]> <chr [2]> <chr [2]>
或者,如果我們使用該group_by方法(假設 'seqs' 中的所有元素都是不同的,則提取第一個串列元素[[
data_frame %>%
mutate(expect_mut = strsplit(as.character(expect_mut), ","),
observed_mut = strsplit(as.character(observed_mut), ",")) %>%
group_by(seqs) %>%
mutate(diff_mut = list(setdiff(observed_mut[[1]], expect_mut[[1]]))) %>%
ungroup
-輸出
# A tibble: 5 × 4
seqs expect_mut observed_mut diff_mut
<chr> <list> <list> <list>
1 seq1 <chr [2]> <chr [1]> <chr [0]>
2 seq2 <chr [2]> <chr [1]> <chr [1]>
3 seq3 <chr [3]> <chr [2]> <chr [1]>
4 seq4 <chr [2]> <chr [2]> <chr [1]>
5 seq5 <chr [2]> <chr [2]> <chr [2]>
注意:rowwise與group_by(以防“seqs”重復)相比,可能沒有錯誤
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/349887.html
