我有以下稱為的字符向量strains:
head(strains, 10)
[1] "Lactobacillus gasseri APC678" "Lactobacillus gasseri DSM 20243"
[3] "Bifidobacterium angulatum B677" "Bifidobacterium breve Reuter S1"
[5] "Lactobacillus reuteri F275" "Lactobacillus acidophilus L917"
[7] "Lactobacillus acidophilus 4357" "Bifidobacterium pseudocatenulatum B1279"
[9] "Bifidobacterium longum subsp. infantis JCM 1210" "Clostridium difficile 43594"
我想要得到的是一個向量,其中的每個元素只有第 3 個單詞。例如,在名為“Lactobacillus gasseri APC678”的元素中,我想只保留“APC678”。
我所做的是以下內容:
library(tidyvese)
lapply(strains %>% str_split(" "), '[', 3) %>% unlist
正如您在我的代碼給出的輸出中看到的那樣,我想要的作業是哪一個:
[1] "APC678" "DSM" "B677" "Reuter" "F275" "L917" "4357" "B1279" "subsp." "43594" "subsp." "F275" "1SL4" "JCM"
[15] "JCM" "AM63" "DSM" "L917" "61D" "Bb14" "AM63" "VPI"
但是,我正在尋找一種更優雅或更簡潔的方式來做同樣的事情,也許使用正則運算式或類似的東西。
這是dput我的資料:
strains <- c("Lactobacillus gasseri APC678", "Lactobacillus gasseri DSM 20243",
"Bifidobacterium angulatum B677", "Bifidobacterium breve Reuter S1",
"Lactobacillus reuteri F275", "Lactobacillus acidophilus L917",
"Lactobacillus acidophilus 4357", "Bifidobacterium pseudocatenulatum B1279",
"Bifidobacterium longum subsp. infantis JCM 1210", "Clostridium difficile 43594"
)
uj5u.com熱心網友回復:
包中有一個非常簡單的word功能,stringr無需使用正則運算式。
library(stringr)
stringr::word(strains, start = 3, end = 3)
[1] "APC678" "DSM" "B677" "Reuter" "F275" "L917" "4357"
[8] "B1279" "subsp." "43594"
uj5u.com熱心網友回復:
您可以使用stringr包:
stringr::str_split(strains, " ", simplify = TRUE)[,3]
uj5u.com熱心網友回復:
使用 Base R 和正則運算式:
sub("^(\\S \\s){2}(\\S ).*", "\\2", strains)
與data.table:
data.table::tstrsplit(strains, " ")[[3]]
# [1] "APC678" "DSM" "B677" "Reuter" "F275" "L917" "4357" "B1279" "subsp." "43594"
uj5u.com熱心網友回復:
另一種可能的解決方案,基于stringr:match和捕獲組:
library(stringr)
str_match(strains, "(\\S \\s){2}(\\S ).*")[,3]
#> [1] "APC678" "DSM" "B677" "Reuter" "F275" "L917" "4357" "B1279"
#> [9] "subsp." "43594"
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/491557.html
上一篇:跨兩個變數拆分有序字串
下一篇:嘗試計算字串中資料型別的數量
