Perl可選捕獲組不起作用？-有解無憂

我有以下 sample.txt 檔案：

2021-10-07 10:32:05,767 ERROR [LAWT2] blah.blah.blah - Message processing FAILED: <ExecutionReport blah="xxx" foo="yyy" SessionID="kkk" MoreStuff="zz"> Total time for which application threads were stopped: 0.0003858 seconds, Stopping threads took: 0.0000653 seconds
2021-10-07 10:31:32,902 ERROR [LAWT6] blah.blah.blah - Message processing FAILED: <NewOrderSingle SessionID="zkx" TargetSubID="ttt" Account="blah" MsgType="D" BookingTypeOverride="0" Symbol="6316" OtherField1="othervalue1" Otherfield2="othervalue2"/></D></NewOrderSingle>

我只想獲取兩個關鍵欄位：“SessionID”和“MsgType”并像這樣列印：

SessionID="kkk"|
SessionID="zkx"|MsgType="D"

換句話說：如果小組賽不存在，我只想列印空白。

我嘗試了以下方法，但沒有運氣：

$$ perl -ne '/ (SessionID=".*?")? .*(MsgType=".*?")? / and print "$1|$2\n"' sample.txt
SessionID="kkk"|
SessionID="zkx"|

有人可以在這里啟發我嗎？非常感謝。

uj5u.com熱心網友回復：

您可以使用

perl -ne '/\h(SessionID="[^"]*")?(?:\h  .*(MsgType="[^"]*"))?\h/ and print "$1|$2\n"'

請參閱正則運算式演示。詳情：

\h - 水平空白
(SessionID="[^"]*")?- 第 1 組：一個可選的SessionID="，除之外的任何零個或多個字符"，然后是一個"
(?:\h .*(MsgType=".*?"))? - 一個可選的（但貪婪的）序列
- \h - 一個或多個水平空白
- .* - 盡可能多的除換行符以外的零個或多個字符
- (MsgType="[^"]*")- 第 2 組：SessionID="，除 , 之外的任何零個或多個字符"，然后是"
\h - 水平空白。

請參閱在線演示：

s='2021-10-07 10:32:05,767 ERROR [LAWT2] blah.blah.blah - Message processing FAILED: <ExecutionReport blah="xxx" foo="yyy" SessionID="kkk" MoreStuff="zz"> Total time for which application threads were stopped: 0.0003858 seconds, Stopping threads took: 0.0000653 seconds
2021-10-07 10:31:32,902 ERROR [LAWT6] blah.blah.blah - Message processing FAILED: <NewOrderSingle SessionID="zkx" TargetSubID="ttt" Account="blah" MsgType="D" BookingTypeOverride="0" Symbol="6316" OtherField1="othervalue1" Otherfield2="othervalue2"/></D></NewOrderSingle>'
perl -ne '/\h(SessionID=".*?")?(?:\h  .*(MsgType=".*?"))?\h/ and print "$1|$2\n"' <<< "$s"

這列印：

SessionID="kkk"|
SessionID="zkx"|MsgType="D"

uj5u.com熱心網友回復：

這并不像看起來那么容易：

/ (SessionID=".*?")? .*(MsgType=".*?")? /
                     ~~

下劃線部分匹配MsgType即使它存在，即使你添加?到它。引擎嘗試從左邊匹配最長的可能部分，因此如果它可以通過盡快匹配成功，它不會回傳 MsgType 。

但是可以使用環視斷言：

/ (SessionID="[^"]*")? (?:(?!.*?MsgType)|.*? (MsgType=".*?")).* /

即要么在 SessionID 后面沒有 MsgType，要么它就在那里，我們捕獲它。

我不建議在捕獲組上使用量詞。另外，看起來日志包含 XML，提取它并使用決議器怎么樣？

uj5u.com熱心網友回復：

抱歉，我在問題中沒有提到的一點是，我計劃提取多個欄位并按確定的順序列印它們，因此我最終撰寫了一個 awk 腳本。

我把它放在這里以防其他人想要使用（我正在處理日志檔案上的數千行，所以腳本是一個不錯的選擇）。

#!/usr/bin/awk
function get_field(the_array, the_field, the_line){
  for (key in the_array) {
      if (the_array[key] ~ the_field){
          if (the_line == "")
              the_line = the_array[key]
          else
              the_line = the_line "|" the_array[key]
          break
      }
  }
  return the_line
}
BEGIN{
    the_line = ""
}
{
    the_line = ""
    delete the_keys
    for(f=1;f<=NF;f  ){
        if (($f ~ "^(ClOrdID|Symbol|MsgType|SessionID|OrdStatus)=") && (the_keys[$f] == "")){
            if (the_line == "")
                the_line = $f
            else
                the_line = $f"|"the_line
            the_keys[$f]  
        }
    }
    arr[the_line]  
}
END{
    for(i in arr) {
        if (i ~ "|"){
            the_line = ""
            split(i,aa,"|")
            # Print the fields in the correct order
            the_line = get_field(aa,"SessionID",the_line)
            the_line = get_field(aa,"ClOrdID",the_line)
            the_line = get_field(aa,"MsgType",the_line)
            the_line = get_field(aa,"OrdStatus",the_line)
            the_line = get_field(aa,"Symbol",the_line)
            print the_line
        } else {
            print(i)
        }
    }
}

使用它：

$$ awk -f aa.awk sample.txt
SessionID="kkk"
SessionID="zkx"|MsgType="D"|Symbol="6316"

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/316246.html

標籤：正则表达式 perl

上一篇：PerlLWP::UserAgent決議回應JSON

下一篇：是使用“||”在禁止子串搜索中？