我制作了一個正則運算式來從模板化字串中提取值。正則運算式在像 regexr.com 這樣的網站上運行順利,但是當我嘗試在 shell 中運行時它失敗了。
例如,讓我們使用這些行:
[2022-11-11T12:07:00.789Z]“GET /check?subject=johnbegucci HTTP/1.1”200 -“-”0 17 3 2“-”“-”“4e4c4fb1-a4d8-4075-8e42-b5fb9216f863” “laundry.transaction.svc.cluster.local:4466”“172.16.107.246:4466”出站|4466||laundry.transaction.svc.cluster.local 172.16.67.246:51630 10.100.111.246:4466 172.16.67.246:48610 -默認
[2022-11-11T13:31:41.189Z]“GET /v1/campaign/198237-jsd-1231 HTTP/1.1”200-“-”0 674 63 63“-”“Apache-HttpClient/4.5.10(Java /11.0.7)" "9b3afd5b-c092-4e84-9f29-6380b7f2cafc" "mkt-extractor.mkt-extractor" "172.16.108.138:80" 出站|80||mkt-extractor.mkt-extractor.svc.cluster。本地 172.16.65.24:57134 10.100.19.249:80 172.16.65.24:38816 - 默認
這兩行都遵循以下模式:
[%START_TIME%] "%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%" %RESPONSE_CODE% %RESPONSE_FLAGS% %BYTES_RECEIVED% %BYTES_SENT% %DURATION% %RESP( X-ENVOY-UPSTREAM-SERVICE-TIME)% "%REQ(X-FORWARDED-FOR)%" "%REQ(USER-AGENT)%" "%REQ(X-REQUEST-ID)%" "%REQ(: AUTHORITY)%" "%UPSTREAM_HOST%" %UPSTREAM_CLUSTER% %UPSTREAM_LOCAL_ADDRESS% %DOWNSTREAM_LOCAL_ADDRESS% %DOWNSTREAM_REMOTE_ADDRESS% %REQUESTED_SERVER_NAME%\n
基于此,我創建了這個正則運算式來從UPSTREAM_HOST. 像這樣的值outbound|4466||laundry.transaction.svc.cluster.local:
(\[.*\])\s(\".*\")\s([0-9]*)\s(.*)\s(\".*\")\s([0-9]*)\s([0-9]*)\s([0-9]*)\s([0-9]*)\s(\".*\")\s(\".*\")\s(\".*\")\s(\".*\")\s(\".*\")\s(.*)\s(.*)\s(.*)\s(.*)\s(.*)\s(.*)
我已經在網站 regexr.com 上測驗了這個正則運算式,它顯示正確的值作為兩行的第 14 組:
outbound|4466||laundry.transaction.svc.cluster.local
outbound|80||mkt-extractor.mkt-extractor.svc.cluster.local
之后,我嘗試執行一個,awk -v FPAT但組看起來不對。要從中獲取值UPSTREAM_HOST,我需要更改列印值,這是不可行的,因為我正在創建一個自動化來處理日志:
echo '[2022-11-11T12:07:00.789Z] "GET /check?subject=johnbegucci HTTP/1.1" 200 - "-" 0 17 3 2 "-" "-" "4e4c4fb1-a4d8-4075-8e42-b5fb9216f863" "laundry.transaction.svc.cluster.local:4466" "172.16.107.246:4466" outbound|4466||laundry.transaction.svc.cluster.local 172.16.67.246:51630 10.100.111.246:4466 172.16.67.246:48610 - default' | awk -v FPAT='(\[.*\])\s(\".*\")\s([0-9]*)\s(.*)\s(\".*\")\s([0-9]*)\s([0-9]*)\s([0-9]*)\s([0-9]*)\s(\".*\")\s(\".*\")\s(\".*\")\s(\".*\")\s(\".*\")\s(.*)\s(.*)\s(.*)\s(.*)\s(.*)\s(.*) ' -v OFS='|' '{print $15}'
# above example im using '{print $15}'
echo '[2022-11-11T13:31:41.189Z] "GET /v1/campaign/198237-jsd-1231 HTTP/1.1" 200 - "-" 0 674 63 63 "-" "Apache-HttpClient/4.5.10 (Java/11.0.7)" "9b3afd5b-c092-4e84-9f29-6380b7f2cafc" "mkt-extractor.mkt-extractor" "172.16.108.138:80" outbound|80||mkt-extractor.mkt-extractor.svc.cluster.local 172.16.65.24:57134 10.100.19.249:80 172.16.65.24:38816 - default' | | awk -v FPAT='(\[.*\])\s(\".*\")\s([0-9]*)\s(.*)\s(\".*\")\s([0-9]*)\s([0-9]*)\s([0-9]*)\s([0-9]*)\s(\".*\")\s(\".*\")\s(\".*\")\s(\".*\")\s(\".*\")\s(.*)\s(.*)\s(.*)\s(.*)\s(.*)\s(.*) ' -v OFS='|' '{print $18}'
# above example im using '{print $18}'
有什么方法可以使它適用于具有相同print位置的兩個日志?
uj5u.com熱心網友回復:
您不需要如此復雜的正則運算式來決議有問題的日志條目。
使用更簡單的正則運算式考慮此awk解決方案:
awk -v FPAT='\\[[^]]*]|"[^"]*"|\\S ' '
{
for (i=1; i<=NF; i) print NR ":" i "::"$i
}' file.log
1:1::[2022-11-11T12:07:00.789Z]
1:2::"GET /check?subject=johnbegucci HTTP/1.1"
1:3::200
1:4::-
1:5::"-"
1:6::0
1:7::17
1:8::3
1:9::2
1:10::"-"
1:11::"-"
1:12::"4e4c4fb1-a4d8-4075-8e42-b5fb9216f863"
1:13::"laundry.transaction.svc.cluster.local:4466"
1:14::"172.16.107.246:4466"
1:15::outbound|4466||laundry.transaction.svc.cluster.local
1:16::172.16.67.246:51630
1:17::10.100.111.246:4466
1:18::172.16.67.246:48610
1:19::-
1:20::default
2:1::[2022-11-11T13:31:41.189Z]
2:2::"GET /v1/campaign/198237-jsd-1231 HTTP/1.1"
2:3::200
2:4::-
2:5::"-"
2:6::0
2:7::674
2:8::63
2:9::63
2:10::"-"
2:11::"Apache-HttpClient/4.5.10 (Java/11.0.7)"
2:12::"9b3afd5b-c092-4e84-9f29-6380b7f2cafc"
2:13::"mkt-extractor.mkt-extractor"
2:14::"172.16.108.138:80"
2:15::outbound|80||mkt-extractor.mkt-extractor.svc.cluster.local
2:16::172.16.65.24:57134
2:17::10.100.19.249:80
2:18::172.16.65.24:38816
2:19::-
2:20::default
我這樣列印是為了向您展示每條記錄中的每一個欄位。
uj5u.com熱心網友回復:
使用您顯示的任何示例,awk請嘗試以下awk代碼。\[[^]]*\]|"[^"]*"|[^[:space:]] 在使用函式的while回圈中使用正則運算式match來匹配這個正則運算式(獲取每行中的所有匹配項),如果找到匹配項,則根據要求列印匹配的值。
awk '
{
count=0
while(match($0,/\[[^]]*\]|"[^"]*"|[^[:space:]] /)){
print NR ":" count "::" substr($0,RSTART,RLENGTH)
$0=substr($0,RSTART RLENGTH)
}
}
' Input_file
uj5u.com熱心網友回復:
假設您對該正則運算式感到滿意,您可以使用 Perl 來執行它:
s1='[2022-11-11T12:07:00.789Z] "GET /check?subject=johnbegucci HTTP/1.1" 200 - "-" 0 17 3 2 "-" "-" "4e4c4fb1-a4d8-4075-8e42-b5fb9216f863" "laundry.transaction.svc.cluster.local:4466" "172.16.107.246:4466" outbound|4466||laundry.transaction.svc.cluster.local 172.16.67.246:51630 10.100.111.246:4466 172.16.67.246:48610 - default'
s2='[2022-11-11T13:31:41.189Z] "GET /v1/campaign/198237-jsd-1231 HTTP/1.1" 200 - "-" 0 674 63 63 "-" "Apache-HttpClient/4.5.10 (Java/11.0.7)" "9b3afd5b-c092-4e84-9f29-6380b7f2cafc" "mkt-extractor.mkt-extractor" "172.16.108.138:80" outbound|80||mkt-extractor.mkt-extractor.svc.cluster.local 172.16.65.24:57134 10.100.19.249:80 172.16.65.24:38816 - default'
echo "$s1" | perl -lnE 'say $14 if /(\[.*\])\s(\".*\")\s([0-9]*)\s(.*)\s(\".*\")\s([0-9]*)\s([0-9]*)\s([0-9]*)\s([0-9]*)\s(\".*\")\s(\".*\")\s(\".*\")\s(\".*\")\s(\".*\")\s(.*)\s(.*)\s(.*)\s(.*)\s(.*)\s(.*) /'
"172.16.107.246:4466"
echo "$s2" | perl -lnE 'say $14 if /(\[.*\])\s(\".*\")\s([0-9]*)\s(.*)\s(\".*\")\s([0-9]*)\s([0-9]*)\s([0-9]*)\s([0-9]*)\s(\".*\")\s(\".*\")\s(\".*\")\s(\".*\")\s(\".*\")\s(.*)\s(.*)\s(.*)\s(.*)\s(.*)\s(.*) /'
"172.16.108.138:80"
uj5u.com熱心網友回復:
取決于您對在所需的之后是否有更多的雙引號 ( ")IP address或在其之前是否有任何管道 ( |) 的確定程度,這些應該主要作業:
mawk NF=NF FS='^.*" "|" [^|] [|]. $' OFS=
172.16.107.246:4466
172.16.108.138:80
gawk NF=NF FS='^.*" "|"[^"] $' OFS=
172.16.107.246:4466
172.16.108.138:80
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/532261.html
標籤:正则表达式壳awk
