我想清理一些 TNM 條目,這是一個示例:
structure(list(TNM = c("pT3 N0 (0/13)", "pT3 N2b (21/45l)", "pT3 N0 (0/32 LK)"
)), class = "data.frame", row.names = c(NA, -3L))
TNM
1 pT3 N0 (0/13)
2 pT3 N2b (21/45l)
3 pT3 N0 (0/32 LK)
到目前為止,我得到了這個:
library(dplyr)
library(stringr)
df %>%
mutate(TNM = str_remove_all(TNM, '\\,|\\;|\\.'),
TNM = str_replace_all(TNM, ' ', ''),
TNM = str_replace_all(TNM, "x", "X")) %>%
mutate(N_count = str_extract(TNM, '\\(\\d \\/\\d \\)'))
TNM N_count
1 pT3N0(0/13) (0/13)
2 pT3N2b(21/45l) <NA>
3 pT3N0(0/32LK) <NA>
這有效:
library(dplyr)
library(stringr)
df %>%
mutate(TNM = str_remove_all(TNM, '\\,|\\;|\\.'),
TNM = str_replace_all(TNM, ' ', ''),
TNM = str_replace_all(TNM, "x", "X")) %>%
mutate(N_count = str_extract(TNM, '\\(\\d \\/\\d \\)|\\(\\d \\/\\d \\w\\)|\\(\\d \\/\\d \\w \\)'))
TNM N_count
1 pT3N0(0/13) (0/13)
2 pT3N2b(21/45l) (21/45l)
3 pT3N0(0/32LK) (0/32LK)
有沒有辦法縮短這個正則運算式:
'\\(\\d \\/\\d \\)|\\(\\d \\/\\d \\w\\)|\\(\\d \\/\\d \\w \\)'?
uj5u.com熱心網友回復:
在交替中,您要匹配 no、單個或 1 個或多個單詞字符。
您可以不使用交替和重復可選單詞字符來縮短模式
\\(\\d /\\d \\w*\\)
正則運算式演示
要匹配(0/32 LK)而不僅僅是尾隨空格(21/45 ),您可以選擇匹配可選的空白字符,后跟 1 個單詞字符:
\\(\\d /\\d (?:\\s*\\w )?\\)
正則運算式演示| R 演示
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/524521.html
標籤:r正则表达式
