我有以下字符向量,其中包括括號、句點和不必要的描述性詞
strings <- c("Poorly Graded Silty Sand (SP-SM).", "(Visual) Lean Clay (CL), with some sand.","Poorly Graded Silty Sand (SP-SM).","(Visual) Inorganic Silt (ML).","(Visual) Lean Clay (CL), with some sand.")
我希望只提取位于每行括號內的字母編碼系統(例如:ML 或 SP-SM)。這是所需的向量。
need <- c("SP-SM", "CL","SP-SM","ML","CL")
這可能嗎?
uj5u.com熱心網友回復:
我們可以使用str_extract正則運算式環顧來匹配左括號后跟一個或多個大寫字母 with -,后跟右括號
library(stringr)
str_extract(strings, "(?<=\\()[A-Z-] (?=\\))")
[1] "SP-SM" "CL" "SP-SM" "ML" "CL"
uj5u.com熱心網友回復:
這是 akrun 解決方案的長版:
str_extract(strings, '\\b[A-Z]{2}\\b\\-\\b[A-Z]{2}\\b|\\b[A-Z]{2}\\b')
輸出:
[1] "SP-SM" "CL" "SP-SM" "ML" "CL"
解釋:
\\b 匹配單詞字符和非單詞字符。
[A-Z]{2} 正好匹配兩個大寫字母。
\\- 匹配連字符。
\\b 匹配單詞字符和非單詞字符。
| 定義 OR
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/340856.html
