我有一個函式可以用檔案中的某些模式替換實際值。我在這里試圖實作的目標是呼叫一個函式,該函式用于gsub以替換值基本上來自另一個函式呼叫的方式查找和替換字串。
$ cat pat-file
name 10101010
phone 10101010
code 10101010
bankaccount 1010101010101
$ cat data_sub.sh
abc()
{
awk '
function mask(str, str_masked) {
for (j=1; j<=length(str); j ) {
if (substr(masks[i], j, 1)==1) {
c = substr(str, j, 1)
} else {
c = "*"
}
str_masked = str_masked c
}
return str_masked
}
FNR == NR {
tags[NR-1] = $1
masks[NR-1] = $2
}
FNR != NR {
line = $0
for (i in tags) {
regex = "<"tags[i]">[^<] </"tags[i]">"
masked_line = ""
l = length(tags[i])
while (match(line, regex) > 0) {
fulltag = substr(line, RSTART, RLENGTH)
tagval = substr(fulltag, l 3, RLENGTH-l-l-5)
fulltag_masked = "<"tags[i]">" mask(tagval) "</"tags[i]">"
masked_line = masked_line substr(line, 1, RSTART-1) fulltag_masked
line = substr(line, RSTART RLENGTH)
}
line = masked_line line
}
print line
}' "$@" pat-file file-1 > output_file
}
abc
該tagval變數存盤在 XML 內部被屏蔽的 XML 標記的值,但由于它也存在于 XML 外部,因此我也需要屏蔽這些值。查看輸入檔案
檔案-1
This is a demo data = ABCD
This is a demo data = XYCD
This is a demo data = ABCD
This is a demo data = BLAH
This is a demo data = ABCD
This is a demo data = MEH
This is a demo data = ABCD
This is a demo data = ABCD
This is a demo data = ABCD
This is a demo data = ABCD and MEH
This is a demo data <tag changed="yes"<name>ABCD</name><phone>98762123</phone><code>MEH</code><bankaccount>4563728495847</bankaccount></tag>
This is a demo data <tag changed="yes"<name>ABCD</name><phone>98762123</phone><code>MEH</code><bankaccount>4563728495847</bankaccount></tag>
This is a demo data <tag changed="yes"<name>ABCD</name><phone>98762123</phone><code>MEH</code><bankaccount>4563728495847</bankaccount></tag>
邏輯很簡單,非常直接,即存盤所有提取的被屏蔽的標記值,然后對這些值執行相同的屏蔽演算法,但在 XML 之外。我怎樣才能做到這一點?
輸出檔案
This is a demo data = ABCD
This is a demo data = XYCD
This is a demo data = ABCD
This is a demo data = BLAH
This is a demo data = ABCD
This is a demo data = MEH
This is a demo data = ABCD
This is a demo data = ABCD
This is a demo data = ABCD
This is a demo data = ABCD and MEH
This is a demo data <tag changed="yes"<name>A*C*</name><phone>9*7*2*2*</phone><code>M*H</code><bankaccount>4*6*7*8*9*8*7</bankaccount></tag>
This is a demo data <tag changed="yes"<name>A*C*</name><phone>9*7*2*2*</phone><code>M*H</code><bankaccount>4*6*7*8*9*8*7</bankaccount></tag>
This is a demo data <tag changed="yes"<name>A*C*</name><phone>9*7*2*2*</phone><code>M*H</code><bankaccount>4*6*7*8*9*8*7</bankaccount></tag>
預期的輸出檔案
This is a demo data = A*C*
This is a demo data = XYCD
This is a demo data = A*C*
This is a demo data = BLAH
This is a demo data = A*C*
This is a demo data = M*H
This is a demo data = A*C*
This is a demo data = A*C*
This is a demo data = A*C*
This is a demo data = A*C* and M*H
This is a demo data <tag changed="yes"<name>A*C*</name><phone>9*7*2*2*</phone><code>M*H</code><bankaccount>4*6*7*8*9*8*7</bankaccount></tag>
This is a demo data <tag changed="yes"<name>A*C*</name><phone>9*7*2*2*</phone><code>M*H</code><bankaccount>4*6*7*8*9*8*7</bankaccount></tag>
This is a demo data <tag changed="yes"<name>A*C*</name><phone>9*7*2*2*</phone><code>M*H</code><bankaccount>4*6*7*8*9*8*7</bankaccount></tag>
uj5u.com熱心網友回復:
假設:
- 如果一個字串出現在不同的標簽(例如,
name=ABCD和code=ABCD)下,那么找到的第一個掩碼awk將用于掩碼字串(即,我們不會優先處理標簽/掩碼對的處理順序) - 字串(被屏蔽)可以出現在一行的任何地方
- 當匹配非標簽子字串時,我們將使用
awk單詞邊界(例如,在屏蔽時ABCD我們也會屏蔽ABCD-XYZ但我們不會屏蔽ABCDABCDnorABCD_XYZ) - 這兩個檔案以及一組值/掩碼值對都將適合記憶體
- 如果 OP 提供了
111111111...(all1's) 的掩碼,我們將繼續執行(有效的)無操作操作
一般操作:
- 處理輸入檔案(例如,
file-1)尋找“標簽”條目 - 如果我們找到任何匹配的“標簽”條目,我們會將建議的掩碼應用于相應的值
- 對于每個被屏蔽的值,我們將在一個新陣列中保留該值及其掩碼的副本
- 對于重復值,我們將應用保存的掩碼
- 所有行,無論有或沒有標簽/屏蔽資料,都保存在一個陣列中
END處理再次遍歷我們的行陣列,查找之前被屏蔽的任何(字邊界)字串,如果找到,則用保存的屏蔽值替換- 在
11111111...(all1's)掩碼的情況下,此END處理也將重新掩碼“標記”條目(仍然有效地,無操作) - 然后將所有行發送到標準輸出
在示例輸入檔案中添加一些行:
$ cat file-1
This is a demo data = ABCD
This is a demo data = XYCD
This is a demo data = ABCD
This is a demo data = BLAH
This is a demo data = ABCD
This is a demo data = MEH
This is a demo data = ABCD
This is a demo data = ABCD
This is a demo data = ABCD
This is a demo data = ABCD and MEH
This is a demo data <tag changed="yes"<name>ABCD</name><phone>98762123</phone><code>MEH</code><bankaccount>4563728495847</bankaccount></tag>
This is a demo data <tag changed="yes"<name>ABCD</name><phone>98762123</phone><code>MEH</code><bankaccount>4563728495847</bankaccount></tag>
This is a demo data <tag changed="yes"<name>ABCD</name><phone>98762123</phone><code>MEH</code><bankaccount>4563728495847</bankaccount></tag>
#####################
# some more lines ...
#####################
This is a demo data = ABCD and XYCD
This is a demo data = XYCD and MEH
This is ABCD and MEH demo data <tag changed="yes"<name>Winkelstein</name><phone>98762123</phone><code>MEH</code><bankaccount>4563728495847</bankaccount></tag>
One last line ABCD ABCD-XYZ ABCDABCD ABCD_XYZ
基于 OP 當前awk代碼的一個想法:
awk '
function mask(str, str_masked) {
for (j=1; j<=length(str); j ) {
if (substr(masks[tag], j, 1) == 1)
c = substr(str, j, 1)
else
c = "*"
str_masked = str_masked c
}
return str_masked
}
FNR == NR { masks[$1] = $2; next }
{ line = $0
for (tag in masks) {
regex = "<" tag ">[^<] </" tag ">"
masked_line = ""
len = length(tag)
while (match(line, regex) > 0) {
val = substr(line, RSTART (len 2), RLENGTH-(len 2)-(len 3))
masked[val]= (val in masked) ? masked[val] : mask(val)
masked_line = masked_line substr(line, 1, RSTART-1) "<" tag ">" masked[val] "</" tag ">"
line = substr(line, RSTART RLENGTH)
}
line = masked_line line
}
lines[FNR]=line
}
END { for (i=1;i<=FNR;i ) {
for (val in masked) {
regex="\\<" val "\\>"
gsub(regex,masked[val],lines[i])
}
print lines[i]
}
}
' pat-file file-1
這會產生:
This is a demo data = A*C*
This is a demo data = XYCD
This is a demo data = A*C*
This is a demo data = BLAH
This is a demo data = A*C*
This is a demo data = M*H
This is a demo data = A*C*
This is a demo data = A*C*
This is a demo data = A*C*
This is a demo data = A*C* and M*H
This is a demo data <tag changed="yes"<name>A*C*</name><phone>9*7*2*2*</phone><code>M*H</code><bankaccount>4*6*7*8*9*8*7</bankaccount></tag>
This is a demo data <tag changed="yes"<name>A*C*</name><phone>9*7*2*2*</phone><code>M*H</code><bankaccount>4*6*7*8*9*8*7</bankaccount></tag>
This is a demo data <tag changed="yes"<name>A*C*</name><phone>9*7*2*2*</phone><code>M*H</code><bankaccount>4*6*7*8*9*8*7</bankaccount></tag>
#####################
# some more lines ...
#####################
This is a demo data = A*C* and XYCD
This is a demo data = XYCD and M*H
This is A*C* and M*H demo data <tag changed="yes"<name>W*n*e*s****</name><phone>9*7*2*2*</phone><code>M*H</code><bankaccount>4*6*7*8*9*8*7</bankaccount></tag>
One last line A*C* A*C*-XYZ ABCDABCD ABCD_XYZ
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/435000.html
