我有以下 sample.txt 檔案:
2021-10-07 10:32:05,767 ERROR [LAWT2] blah.blah.blah - Message processing FAILED: <ExecutionReport blah="xxx" foo="yyy" SessionID="kkk" MoreStuff="zz"> Total time for which application threads were stopped: 0.0003858 seconds, Stopping threads took: 0.0000653 seconds
2021-10-07 10:31:32,902 ERROR [LAWT6] blah.blah.blah - Message processing FAILED: <NewOrderSingle SessionID="zkx" TargetSubID="ttt" Account="blah" MsgType="D" BookingTypeOverride="0" Symbol="6316" OtherField1="othervalue1" Otherfield2="othervalue2"/></D></NewOrderSingle>
我只想獲取兩個關鍵欄位:“SessionID”和“MsgType”并像這樣列印:
SessionID="kkk"|
SessionID="zkx"|MsgType="D"
換句話說:如果小組賽不存在,我只想列印空白。
我嘗試了以下方法,但沒有運氣:
$$ perl -ne '/ (SessionID=".*?")? .*(MsgType=".*?")? / and print "$1|$2\n"' sample.txt
SessionID="kkk"|
SessionID="zkx"|
有人可以在這里啟發我嗎?非常感謝。
uj5u.com熱心網友回復:
您可以使用
perl -ne '/\h(SessionID="[^"]*")?(?:\h .*(MsgType="[^"]*"))?\h/ and print "$1|$2\n"'
請參閱正則運算式演示。詳情:
\h- 水平空白(SessionID="[^"]*")?- 第 1 組:一個可選的SessionID=",除 之外的任何零個或多個字符",然后是一個"(?:\h .*(MsgType=".*?"))?- 一個可選的(但貪婪的)序列\h- 一個或多個水平空白.*- 盡可能多的除換行符以外的零個或多個字符(MsgType="[^"]*")- 第 2 組:SessionID=",除 , 之外的任何零個或多個字符",然后是"
\h- 水平空白。
請參閱在線演示:
s='2021-10-07 10:32:05,767 ERROR [LAWT2] blah.blah.blah - Message processing FAILED: <ExecutionReport blah="xxx" foo="yyy" SessionID="kkk" MoreStuff="zz"> Total time for which application threads were stopped: 0.0003858 seconds, Stopping threads took: 0.0000653 seconds
2021-10-07 10:31:32,902 ERROR [LAWT6] blah.blah.blah - Message processing FAILED: <NewOrderSingle SessionID="zkx" TargetSubID="ttt" Account="blah" MsgType="D" BookingTypeOverride="0" Symbol="6316" OtherField1="othervalue1" Otherfield2="othervalue2"/></D></NewOrderSingle>'
perl -ne '/\h(SessionID=".*?")?(?:\h .*(MsgType=".*?"))?\h/ and print "$1|$2\n"' <<< "$s"
這列印:
SessionID="kkk"|
SessionID="zkx"|MsgType="D"
uj5u.com熱心網友回復:
這并不像看起來那么容易:
/ (SessionID=".*?")? .*(MsgType=".*?")? /
~~
下劃線部分匹配MsgType即使它存在,即使你添加?到它。引擎嘗試從左邊匹配最長的可能部分,因此如果它可以通過盡快匹配成功,它不會回傳 MsgType 。
但是可以使用環視斷言:
/ (SessionID="[^"]*")? (?:(?!.*?MsgType)|.*? (MsgType=".*?")).* /
即要么在 SessionID 后面沒有 MsgType,要么它就在那里,我們捕獲它。
我不建議在捕獲組上使用量詞。另外,看起來日志包含 XML,提取它并使用決議器怎么樣?
uj5u.com熱心網友回復:
抱歉,我在問題中沒有提到的一點是,我計劃提取多個欄位并按確定的順序列印它們,因此我最終撰寫了一個 awk 腳本。
我把它放在這里以防其他人想要使用(我正在處理日志檔案上的數千行,所以腳本是一個不錯的選擇)。
#!/usr/bin/awk
function get_field(the_array, the_field, the_line){
for (key in the_array) {
if (the_array[key] ~ the_field){
if (the_line == "")
the_line = the_array[key]
else
the_line = the_line "|" the_array[key]
break
}
}
return the_line
}
BEGIN{
the_line = ""
}
{
the_line = ""
delete the_keys
for(f=1;f<=NF;f ){
if (($f ~ "^(ClOrdID|Symbol|MsgType|SessionID|OrdStatus)=") && (the_keys[$f] == "")){
if (the_line == "")
the_line = $f
else
the_line = $f"|"the_line
the_keys[$f]
}
}
arr[the_line]
}
END{
for(i in arr) {
if (i ~ "|"){
the_line = ""
split(i,aa,"|")
# Print the fields in the correct order
the_line = get_field(aa,"SessionID",the_line)
the_line = get_field(aa,"ClOrdID",the_line)
the_line = get_field(aa,"MsgType",the_line)
the_line = get_field(aa,"OrdStatus",the_line)
the_line = get_field(aa,"Symbol",the_line)
print the_line
} else {
print(i)
}
}
}
使用它:
$$ awk -f aa.awk sample.txt
SessionID="kkk"
SessionID="zkx"|MsgType="D"|Symbol="6316"
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/316246.html
上一篇:PerlLWP::UserAgent決議回應JSON
下一篇:是使用“||”在禁止子串搜索中?
