我有這個字符向量:
protein = "ARTKQTARKSTGGKAPRKQLATKAARKSAPATGGVKKPHRYRPGTVALREIRRYQKSTELLIRKLPFQRLVREIAQDFKTDLRFQSSAVMALQEACEAYLVGLFEDTNLCAIHAKRVTIMPKDIQLARRIRGERA"
我想根據字母R的出現將它分段。
peptide_fragments <- str_split(protein, "(?<=[R])")
現在從結果片段中,我想省略子字串:
- 不包含字母 K
然后從剩余的子串中省略:
- 字符長度小于 6 的字符。
uj5u.com熱心網友回復:
使用純 R 正則運算式方法,我們可以嘗試:
protein <- "ARTKQTARKSTGGKAPRKQLATKAARKSAPATGGVKKPHRYRPGTVALREIRRYQKSTELLIRKLPFQRLVREIAQDFKTDLRFQSSAVMALQEACEAYLVGLFEDTNLCAIHAKRVTIMPKDIQLARRIRGERA"
parts <- strsplit(protein, "(?<=R)", perl=TRUE)[[1]]
output <- grep("^(?=.*K).{6,}$", parts, value=TRUE, perl=TRUE)
output
[1] "TKQTAR" "KSTGGKAPR"
[3] "KQLATKAAR" "KSAPATGGVKKPHR"
[5] "YQKSTELLIR" "KLPFQR"
[7] "EIAQDFKTDLR" "FQSSAVMALQEACEAYLVGLFEDTNLCAIHAKR"
[9] "VTIMPKDIQLAR"
uj5u.com熱心網友回復:
如果要在“R”后拆分:
temp <- unlist(str_split(protein, "(?<=R)"))
res <- temp[grepl("K", temp) & !nchar(temp) < 6]
結果:
res
[1] "TKQTAR" "KSTGGKAPR"
[3] "KQLATKAAR" "KSAPATGGVKKPHR"
[5] "YQKSTELLIR" "KLPFQR"
[7] "EIAQDFKTDLR" "FQSSAVMALQEACEAYLVGLFEDTNLCAIHAKR"
[9] "VTIMPKDIQLAR"
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/365675.html
