跳過轉義終止符的Nom決議器-有解無憂

我已經檢查了 nom parser 組合器問題的其他 SO 答案，但似乎還沒有提出這個問題。

我正在嘗試決議分隔的正則運算式，它們將始終用分隔/...../，也許在末尾帶有修飾符（對于我現在需要決議的所有資料超出范圍。）但是如果\/中間有轉義字串，我的決議器在第一次過早停止，/即使它前面有\.

我有這個決議器：

use nom::bytes::complete::{tag, take_until};
use nom::{combinator::map_res, sequence::tuple, IResult};
use regex::Regex;

pub fn regex(input: &str) -> IResult<&str, Regex> {
    map_res(
        tuple((tag("/"), take_until("/"), tag("/"))),
        |(_, re, _)| Regex::new(re),
    )(input)
}

很自然地，一take_until開始就停了下來，/沒有注意到前一個角色是 a \，我看過peekand recognize，還有map一大堆其他的東西，但我只是有點不足，我覺得我真的想要take_until("/")某種要么是編碼意識，要么就是……無論如何，我習慣于將map_res其交給 Rust 的regexcrate 來進行決議。

我也使用escaped組合器嘗試過這樣的事情，但這些例子有些不清楚，我無法讓它作業：

pub fn regex(input: &str) -> IResult<&str, Regex> {
    map_res(
        tuple((
            tag("/"),
            escaped(many1(anychar), '\\', one_of(r"/")),
            tag("/"),
        )),
        |(_, re, _)| {
            println!("mapres {}", re);
            Regex::new(re)
        },
    )(input)
}

我的測驗用例是這樣的（.unwrap().as_str()只是舉個小例子，因為regex::Regex沒有實作PartialEq）：

#[cfg(test)]
mod tests {
    use super::regex;
    use super::Regex;
    #[test]
    fn test_parse_regex_simple() {
        assert_eq!(
            Regex::new(r#"hello world"#).unwrap().as_str(),
            regex("/hello world/").unwrap().1.as_str()
        );
    }
    #[test]
    fn test_parse_regex_with_escaped_forwardslash() {
        assert_eq!(
            Regex::new(r#"hello /world"#).unwrap().as_str(),
            regex(r"/hello \/world/").unwrap().1.as_str(),
        );
    }
}

uj5u.com熱心網友回復：

作為第一個引數傳遞的決議器escaped()應該決議一個不是轉義字符的字符，并在正確的字符處停止。many1(anychar)不回答任何這些條件。

相反，您應該這樣稱呼它：

escaped(none_of(r"\/"), '\\', one_of(r"/"))

或整個運算式：

map_res(
    tuple((
        tag("/"),
        escaped(none_of(r"\/"), '\\', one_of(r"/")),
        tag("/"),
    )),
    |(_, re, _)| Regex::new(re),
)(input)

但它不起作用。因為Regex的轉義序列不包括/. 所以你需要去掉轉義字符。幸運的是，escaped_transform()這里可以幫助您：

map_res(
    tuple((
        tag("/"),
        escaped_transform(none_of(r"\/"), '\\', one_of(r"/")),
        tag("/"),
    )),
    |(_, re, _)| Regex::new(&re), // We need a little `&` here because `escape_transform()` returns a `String` but `Regex::new()` wants `&str`
)(input)

uj5u.com熱心網友回復：

Chayim Friedman 接受的答案是正確的，但是我能夠將其擴展到處理\w \d和其他此類修飾符，因此它只是 Chayim 在escaped_transform版本中的想法的擴展：


pub fn regex(input: &str) -> IResult<&str, Regex> {
    map_res(
        delimited(
            tag("/"),
            escaped_transform(
                none_of("\\/"),
                '\\',
                alt((
                    value(r"/", tag("/")),
                    value(r"\d", tag("d")),
                    value(r"\W", tag("W")),
                    value(r"\w", tag("w")),
                    value(r"\b", tag("b")),
                    value(r"\B", tag("B")),
                )),
            ),
            tag("/"),
        ),
        |re| Regex::new(&re),
    )(input)
}

請注意，此串列也不完整，但https://docs.rs/regex/1.5.6/regex/#escape-sequences提供了完整的轉義集，并且https://github.com/Geal/nom/blob/ main/examples/string.rs更詳細地解釋了如何處理\u{....}型別轉義序列。

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/479778.html

標籤：正则表达式解析锈解析器组合器

上一篇：Python中的正則運算式不回傳任何內容（使用正則運算式時搜索的搜索引數關鍵字）

下一篇：如何使這個正則運算式相對URL提取在grep中作業？