正則運算式在字符上拆分字串，內部字串除外-有解無憂

我有一個像這樣的字串aa | bb | "cc | dd" | 'ee | ff'，我正在尋找一種方法來分割它以獲得由|字符分隔的所有值，并且|包含在字串中。

這個想法是得到這樣的東西 [a, b, "cc | dd", 'ee | ff']

我已經在這里找到了類似問題的答案：https : //stackoverflow.com/a/11457952/11260467

但是，我找不到一種方法來適應具有多個分隔符的情況，這里有沒有人在正則表??達式方面比我更愚蠢？

uj5u.com熱心網友回復：

這可以通過以下(*SKIP)(*FAIL)功能輕松完成pcre：

(['"]).*?\1(*SKIP)(*FAIL)|\s*\|\s*

在PHP這可能是：

<?php

$string = "aa | bb | \"cc | dd\" | 'ee | ff'";

$pattern = '~([\'"]).*?\1(*SKIP)(*FAIL)|\s*\|\s*~';

$splitted = preg_split($pattern, $string);
print_r($splitted);
?>

并且會產生

Array
(
    [0] => aa
    [1] => bb
    [2] => "cc | dd"
    [3] => 'ee | ff'
)

在 regex101.com和ideone.com上查看演示。

uj5u.com熱心網友回復：

如果您匹配零件（而不是拆分），這會更容易。模式默認是貪婪的，它們會消耗盡可能多的字符。這允許在為不帶引號的標記提供模式之前為帶引號的字串定義更復雜的模式：

$subject = '[ aa | bb | "cc | dd" | \'ee | ff\' ]';

$pattern = <<<'PATTERN'
(
    (?:[|[]|^) # after | or [ or string start
    \s*
    (?<token> # name the match
        "[^"]*" # string in double quotes
        |
        '[^']*'  # string in single quotes
        |
        [^\s|]  # non-whitespace 
    )
    \s*
)x
PATTERN;

preg_match_all($pattern, $subject, $matches);
var_dump($matches['token']);

輸出：

array(4) {
  [0]=>
  string(2) "aa"
  [1]=>
  string(2) "bb"
  [2]=>
  string(9) ""cc | dd""
  [3]=>
  string(9) "'ee | ff'"
}

提示：

在<<<'PATTERN'被稱為定界符語法和減少了逃逸
我()用作模式分隔符 - 它們是組 0
命名匹配使代碼更具可讀性
修飾符x允許縮進和注釋模式

uj5u.com熱心網友回復：

用

$string = "aa | bb | \"cc | dd\" | 'ee | ff'";
preg_match_all("~(?|\"([^\"]*)\"|'([^']*)'|([^|'\"] ))(?:\s*\|\s*|\z)~", $string, $matches);
print_r(array_map(function($x) {return trim($x);}, $matches[1]));

請參閱PHP 證明。

結果：

Array
(
    [0] => aa
    [1] => bb
    [2] => cc | dd
    [3] => ee | ff
)

解釋

--------------------------------------------------------------------------------
  (?|                      Branch reset group, does not capture:
--------------------------------------------------------------------------------
    \"                       '"'
--------------------------------------------------------------------------------
    (                        group and capture to \1:
--------------------------------------------------------------------------------
      [^\"]*                   any character except: '\"' (0 or more
                               times (matching the most amount
                               possible))
--------------------------------------------------------------------------------
    )                        end of \1
--------------------------------------------------------------------------------
    \"                       '"'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    '                        '\''
--------------------------------------------------------------------------------
    (                        group and capture to \1:
--------------------------------------------------------------------------------
      [^']*                    any character except: ''' (0 or more
                               times (matching the most amount
                               possible))
--------------------------------------------------------------------------------
    )                        end of \1
--------------------------------------------------------------------------------
    '                        '\''
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    (                        group and capture to \1:
--------------------------------------------------------------------------------
      [^|'\"]                  any character except: '|', ''', '\"'
                               (1 or more times (matching the most
                               amount possible))
--------------------------------------------------------------------------------
    )                        end of \1
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    \|                       '|'
--------------------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    \z                       the end of the string
--------------------------------------------------------------------------------
  )                        end of grouping

uj5u.com熱心網友回復：

有趣的是，有很多方法可以為這個問題構造正則運算式。這是另一個類似于@Jan 的答案。

(['"]).*?\1\K| *\| *

PCRE演示

(['"]) # match a single or double quote and save to capture group 1
.*?    # match zero or more characters lazily
\1     # match the content of capture group 1
\K     # reset the starting point of the reported match and discard
       # any previously-consumed characters from the reported match
|      # or
\ *    # match zero or more spaces
\|     # match a pipe character
\ *    # match zero or more spaces

請注意，管道字符（“或”）之前的部分僅用于將引擎的內部字串指標移動到結束引號或帶引號的子字串之后。

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/321783.html

標籤：php 正则表达式细绳分裂

上一篇：將std::string轉換為std::string_view的時間復雜度

下一篇：如何在Python中將切片字串中的單個字符轉換為整數？