我有一個txt包含模式John和幾個國家的字串。我也有vec_regex,一堆匹配國家的正則運算式(但沒有在文本中全部提到)。
我想得到的是最接近約翰左邊的匹配國家和約翰之間的文本:法國文本約翰。
我認為這是需要的負面前瞻,但我無法讓它作業。(見這里和這里)。非常感謝!
library(stringr)
txt <- "Germany Russia and Germany Russia text Germany text France text John text text France and Spain"
vec_regex <- c("German\\w*", "France|French", "Spain|Spanish", "Russia\\w*")
vec_regex_or <- paste(vec_regex, collapse="|")
vec_regex_or
#> [1] "German\\w*|France|French|Spain|Spanish|Russia\\w*"
pattern_left <- paste0("(",vec_regex_or, ")",".*John")
pattern_left
#> [1] "(German\\w*|France|French|Spain|Spanish|Russia\\w*).*John"
str_extract(txt, regex(pattern_left))
#> [1] "Germany Russia and Germany Russia text Germany text France text John"
pattern_left <- paste0("(",vec_regex_or, ")","(?!(",vec_regex_or,"))",".*John") #neg. lookahead
pattern_left
#> [1] "(German\\w*|France|French|Spain|Spanish|Russia\\w*)(?!(German\\w*|France|French|Spain|Spanish|Russia\\w*)).*John"
str_extract(txt, regex(pattern_left))
#> [1] "Germany Russia and Germany Russia text Germany text France text John"
由reprex 包(v2.0.1)于 2021 年 12 月 30 日創建
uj5u.com熱心網友回復:
你需要使用
pattern_left <- paste0("(",vec_regex_or, ")","(?:(?!",vec_regex_or,").)*","John")
pattern_left
# => [1] "(German\\w*|France|French|Spain|Spanish|Russia\\w*)(?:(?!German\\w*|France|French|Spain|Spanish|Russia\\w*).)*John"
str_extract(txt, regex(pattern_left))
# => [1] "France text John"
該"(?:(?!",vec_regex_or,").)*"部分正確創建了緩和的貪婪令牌。
此外,如果您打算將這些字串作為整個單詞進行匹配,請考慮添加單詞邊界:
pattern_left <- paste0("\\b(",vec_regex_or, ")\\b","(?:(?!",vec_regex_or,").)*","John\\b")
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/399396.html
