Linux：shell腳本進階（sed/gawk）-有解無憂

sed命令進階

保持空間和模式空間

Sed進階篇實體應用

gawk命令進階

#本文章為學習筆記

sed命令進階

1>n命令：移動到下一行

小寫n命令會告訴sed編輯器移動到資料流下一行進行處理

1.1 n命令原理

n命令簡單來說就是提前讀取下一行，覆寫模型空間前一行（并沒有洗掉，因此依然列印至標準輸出），如果命令未執行成功（并非跳過：前端條件不匹配），則放棄之后的任何命令，并對新讀取的內容，重頭執行sed，

例子：

從aaa檔案中取出偶數行

cat aaa

This is 1

This is 2

This is 3

This is 4

This is 5

sed -n 'n;p' aaa //-n表示隱藏默認輸出內容

This is 2

This is 4

注釋：讀取This is 1，執行n命令，此時模式空間為This is 2，執行p，列印模式空間內容This is 2，之后讀取 This is 3，執行n命令，此時模式空間為This is 4，執行p，列印模式空間內容This is 4，之后讀取This is 5，執行n 命令，因為沒有了，所以退出，并放棄p命令，

因此，最終列印出來的就是偶數行，

2>N命令：合并文本行

大寫N會將下一文本行添加到模式空間中已有的文本后，這樣兩個文本行合并到同一個模式空間中，文本行仍然用換行符分隔，如果要在檔案中查找可能分散在兩行中的文本，這個是很實用的功能，

2.1 N命令原理：

N命令簡單來說就是追加下一行到模式空間，同時將兩行看做一行，但是兩行之間依然含有\n換行符，如果命令未執行成功（并非跳過：前端條件不匹配），則放棄之后任何命令，并對新讀取的內容，重頭執行sed，

例子：

從aaa檔案中讀取奇數行

cat aaa

This is 1

This is 2

This is 3

This is 4

This is 5

sed -n '$!N;P' aaa #P是只列印第一行（首字母到\n），后面有講到；

This is 1

This is 3

This is 5

注釋中1代表This is 1 2代表This is 2 以此類推

注釋：讀取This is 1 ，不是尾行，執行N命令，得出1\n2，執行P，列印得This is 1 ，讀取This is 3，不是尾行，執行N命令，得出3\n4，執行P，列印得This is 3，讀取This is 5，是尾行，跳過N（!N），執行P，列印得This is 5，

2.2 配置示例

例1
$ cat data2.txt
this is the header line.
this is the first data line.
this is the second data line.
this is the last line.

$ sed '/first/{N;s/\n/ /}' data2.txt
this is the header line.
this is the first data line. this is the second data line.
this is the last line.

例2
$ cat data3.txt
on Tuesday,the linux system
administrator's group meeting will be held.
all system administrators should attend.
thank you for your attendance.

$ sed 'N;s/system.administrator/desktop user/' data3.txt
on Tuesday,the linux desktop user's group meeting will be held.
all desktop users should attend.
thank you for your attendance.

3>只洗掉前一行的命令，D命令

當有多行匹配出現時，D命令只會洗掉模式空間中的第一行

3.1 D命令作業原理

D命令是洗掉當前模式空間開端至\n的內容（不在傳至標準輸出），放棄之后的命令，但是對剩余模式空間重新執行sed，D命令的獨特之處在于強制sed編輯器回傳到腳本的起始處，對同一模式空間中的內容重新執
行這些命令（它不會從資料流中讀取新的文本行），這個比較特殊，不好理解，

D命令例子

從aaa檔案中讀取最后一行

cat aaa

This is 1

This is 2

This is 3

This is 4

This is 5

sed 'N;D' aaa

This is 5

注釋：讀取This is 1，執行N，得出This is 1\nThis is 2，執行D，得出This is 2，繼續執行N（不是重新讀取下一行），得出This is 2\nThis is 3，執行D，得出This is 3，依此類推，得出This is 5，執行N，條件失敗退出，因無-n引數（-n只顯示匹配處理的行），故輸出This is 5，

3.2 配置示例


$ cat data5.txt

this is the header line.
this is a data line.

this is the last line.

例1:
$ sed '/^$/{N;/header/D}' data5.txt
this is the header line.
this is a data line.

this is the last line.

例2:列印最后兩行
#思路就是，在執行到倒數第二行時，合并最后一行，并且不洗掉，最后一行執行動作，結束，
sed '$!N;$!D' data5.txt    

this is the last line.

4>只列印前一行，P命令

P命令只會列印模式空間中的第一行，

$ cat data3.txt
on Tuesday,the linux system
administrator's group meeting will be held.
all system administrators should attend.
thank you for your attendance.

$ sed -n 'N;/system\nadministrator/P' data3.txt   #使用大寫P列印
on Tuesday,the linux system

$ sed -n 'N;/system\nadministrator/p' data3.txt   #使用小寫p列印
on Tuesday,the linux system
administrator's group meeting will be held.

$ sed -n 'N;/for/P' data3.txt               
all system administrators should attend.

5>排除命令（!）

例1
$ cat data2.txt
this is the header line.
this is the first data line.
this is the second data line.
this is the last line.

$ sed -n '/headeer/!p' data2.txt
this is the first data line.
this is the second data line.
this is the last line.

6>分支，用于改變資料流

格式：[address]b [:label]

address引數決定了哪些行的資料會觸發分支命令，label引數定義了要跳轉到的位置，

例1
$ cat data2.txt
this is the header line.
this is the first data line.
this is the second data line.
this is the last line.
$ sed '{2,3b;s/this is /is this/;s/line./test?/}' data2.txt
is this the header test?
this is the first data line.
this is the second data line.
is this the last test?

例2
$ sed '{/first/b jump1;s/this is the/no jump on/
>:jump1
>s/this is the/jump here on/}' data2.txt

no jump on header line
jump here on first data line
no jump on second data line
no jump on last line

例3
$ sed -n '{
> ：start
> s/,//1p
> b start
> }'

7>測驗（test）命令，用來改變sed編輯器腳本的執行流程，相當于邏輯運算子里面的“或”

例1
$ cat data2.txt
this is the header line.
this is the first data line.
this is the second data line.
this is the last line.
$ sed '{
>s/first/matched/
>t
>s/this is the/no match on/
}' data2.txt
no match on header line
this is the matched data line
no match on second data line
no match on last line

例2
$ echo "this,is,a,test,to,remove,commas."|sed -n '{
>:start
>s/,/ /1p
>b start
}'
#當無需替換時，測驗命令不會跳轉而是繼續執行剩下的腳本

8>替代模式/后向參考（&）

&符號用來代表替換命令中的匹配模式，當要匹配字串的其中一部分內容時，可用圓括號定義模式中的子模式，用\1、\2等來代表呼叫子模式，

例1
$ echo 'the cat sleeps in his hat'|sed 's/.at/"&"/g'
the "cat" sleeps in his "hat"

例2
$ echo 'the system administrator manual'|sed '
>s/\(system\) administrator/\1 user/'
the system user manual

例3
$ echo '1234567'|sed '{
>:start
>s/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/
>t start
>}'
1,234,567

保持空間和模式空間

模式空間：就是sed編輯器存放待處理文本的緩沖區，sed會每次讀取一行文本，并放入模式空間，并執行編輯命令，

保持空間：臨時保存一些行的緩沖區，當模式空間的資料還要再進行處理時（不馬上輸出），就會用到，

有五條命令可用來操作保持空間：

h	將模式空間復制到保持空間（會覆寫）
H	將模式空間追加到保持空間
g	將保持空間復制到模式空間
G	將保持空間追加到模式空間
x	交換模式空間和保持空間的內容

例1
$ cat data2.txt
this is the header line.
this is the first data line.
this is the second data line.
this is the last line.
$ sed -n '{1!G;h;$p}' data2.txt
this is the last line.
this is the second data line.
this is the first data line.
this is the header line.

說明：

第一步：將第一行（this is the header line.）到模式空間，然后放到保持空間；

第二步：讀取第二行（this is the first data line.）到模式空間，并將保持空間中的（this is the header line.）追加到模式空間，變成（this is the first data line.\nthis is the header line.），并將（this is the first data line.\nthis is the header line.）放到保持空間中，

第三步：讀取第三行（this is the second data line.）到模式空間，并將保持空間中的（this is the first data line.\nthis is the header line.）追加到模式空間，變成（this is the first data line.\nthis is the header line.\nthis is the second data line.），并放到保持空間中，

第四步：讀取第四行（this is the last line.）到模式空間，并將保持空間中的內容追加到模式空間，最后列印出來，

Sed進階篇實體應用

示例文本：

cat data2.txt
this is the header line.
this is the first data line.
this is the second data line.
this is the last line.


實體：
例1：加倍行間距，每行后面加一行空白行；
$ sed 'G' data2.txt
this is the header line.

this is the first data line.

this is the second data line.

this is the last line.

$ sed '$!G' data2.txt      #每行后面加一空白行，最后一行不加
this is the header line.

this is the first data line.

this is the second data line.

this is the last line.
$

例2：給檔案中的行編號；
$ sed '=' data2.txt | sed 'N;s/\n'/ /'
1 this is the header line.
2 this is the first data line.
3 this is the second data line.
4 this is the last line.
#如果有空白行的話，需要注意先去除空白行

例3：列印末尾10行
$ sed '{
> :start
> $q;N;11,$D
> b start
> }' data7.txt

例4：洗掉空白行
$ sed '/./,/^$/!d' data8.txt     #洗掉連續空白行；
$ sed '/./,$!d' data9.txt        #洗掉開頭n行空白行；
$ sed '{                         #洗掉末尾空白行；
> :start
> /^\n*$/{$d;N;b start}
> }'

gawk命令進階

1>內建變數

FIELDWIDTHS	由空格分隔的一系列數字，定義了每個資料欄位的確切寬度
FS	輸入欄位分隔符，默認是空格符
RS	輸入記錄分隔符，每行是一條記錄，默認是換行符
OFS	輸出欄位分隔符，默認是空格符
ORS	輸出記錄分隔符，每行是一條記錄，默認是換行符

資料變數：

NF	欄位總數
NR	已處理的行數
ENVIRON	關聯陣列組成的shell環境變數值，格式如下 print ENVIRON["HOME"]

例1：將逗號分隔符替換為“-”，并輸出前三項
$ cat data1
data11,data12,data13,data14,data15
data21,data22,data23,data24,data25
data31,data32,data33,data34,data35

$ gawk 'BEGIN{FS=",";OFS="-"} {print $1,$2,$3}' data1
data11-data12-data13
data21-data22-data23
data31-data32-data33

例2：將每行當作一個欄位，把空白行當作記錄分隔符
$ cat data2
riley mullen
123 main street
chicago,il 60601
555-1234

frank williams
456 oak street
indianapolis ,in 46201
5559876

haley snell
4231 elm street
detroit,mi 48201
555-4938

$ gawk 'BEGIN{FS="\n";RS=""} {print $1,$4}' data2
riley mullen 555-1234
frank williams 555-9876
haley snell 555-4938

例3：按固定寬度分隔欄位
$ cat data1b
1005.3247596.37
115-2.349194.00
05810.1298100.1

$ gawk 'BEGIN{FIELDWIDTHS="3 5 2 5"}{print $1,$2,$3,$4}' data1b
100 5.324 75 96.37
115 -2.34 91 94.00
058 10.12 98 100.1

例4：在腳本中使用變數賦值
$ gawk '
> BEGIN{
> testing="this is a test"
> print testing
> testing=46
> print testing
> }'

$ this a testing
$ 45

遍歷關聯陣列
$ gawk 'BEGIN{
> var[a]=1
> var[g]=2
> var[u]=3
> for (test in var)
> {
> print "index:",test," "- Value:",var[test]
> }
> }'

index: u  - Value: 3
index: a  - Value: 1
index: g  - Value: 2

洗掉陣列元素值：
delete array[index]

匹配限定
$ gawk -F: '$1 ~ /rich/{print $1,$NF}' data1
rich /bin/bash
#這個例子會在第一個資料欄位中查找文本rich，如果在記錄中找到了這個模式，它會列印
該記錄的第一個和最后一個資料欄位值；

數學運算式應用
$ gawk -F: '$4 == 0 {print $1}' /etc/passwd
root
sync
shutdown
halt
operator

結構化命令，if陳述句
if (condition)
statement1
也可以將它放在一行上，像這樣：
if (condition) statement1

$ gawk '{if ($1 > 20) print $1}' data4
$ gawk '{
> if ($1 > 20)
> {
> x=$1 * 2
> print x
> }
> }' data4


$ gawk '{
> if ($1 > 20)
> {
> x = $1 * 2
> print x
> } else
> {
> x = $1 / 2
> print x
> }}' data4

可以在單行上使用else子句，但必須在if陳述句部分之后使用分號，
if (condition) statement1; else statement2


結構化命令，while陳述句
while (condition)
{
statements
}

$ gawk '{
> total = 0
> i = 1
> while (i < 4)
> {
> total += $i
> i++
> }
> avg = total / 3
> print "Average:",avg
> }' data5


$ gawk '{
> total = 0
> i = 1
> while (i < 4)
> {
> total += $i
> if (i == 2)
> break
> i++
> }
> avg = total / 2
> print "The average of the first two data elements is:",avg
> }' data5

結構化命令，do-while陳述句類似于while陳述句，但會在檢查條件陳述句之前執行命令，下面是do-while語
句的格式，
do
{
statements
} while (condition)
這種格式保證了陳述句會在條件被求值之前至少執行一次，當需要在求值條件前執行陳述句時，
這個特性非常方便，
$ gawk '{
> total = 0
> i = 1
> do
> {
> total += $i
> i++
> } while (total < 150)
> print total }' data5



for陳述句是許多編程語言執行回圈的常見方法，gawk編程語言支持C風格的for回圈，
for( variable assignment; condition; iteration process)
將多個功能合并到一個陳述句有助于簡化回圈，
$ gawk '{
> total = 0
> for (i = 1; i < 4; i++)
> {
> total += $i
> }
> avg = total / 3
> print "Average:",avg
> }' data5

格式化輸出，printf，和c語言用法一樣
c 將一個數作為ASCII字符顯示
d 顯示一個整數值
i 顯示一個整數值（跟d一樣）
e 用科學計數法顯示一個數
f 顯示一個浮點值
g 用科學計數法或浮點數顯示（選擇較短的格式）
o 顯示一個八進制值
s 顯示一個文本字串
x 顯示一個十六進制值
X 顯示一個十六進制值，但用大寫字母A~F


注意，你需要在printf命令的末尾手動添加換行符來生成新行，沒添加的話，printf命令
會繼續在同一行列印后續輸出，

printf默認輸出是右對齊，可通過“-”來控制成左對齊：
$ gawk 'BEGIN{FS="\n"; RS=""} {printf "%-16s %s\n", $1, $4}' data2
Riley Mullen (312)555-1234
Frank Williams (317)555-9876
Haley Snell (313)555-4938
$

處理浮點數：
$ gawk '{
> total = 0
> for (i = 1; i < 4; i++)
> {
> total += $i
> }
> avg = total / 3
> printf "Average: %5.1f\n",avg
> }' data5
Average: 128.3
Average: 137.7
Average: 176.7
$

#列印test.txt的第3行至第5行
awk 'NR==3,NR==5 {print}' test.txt
#列印test.txt的第3行至第5行的第一列與最后一列
awk 'NR==3,NR==5 {print $1,$NF}' test.txt

#列印test.txt中長度大于80的行號
awk 'length($0)>80 {print NR}' test.txt

#計算test.txt中第一列的總和
cat test.txt|awk '{sum+=$1}END{print sum}'

#添加自定義字符
ifconfig eth0|grep 'Bcast'|awk '{print "ip_" $2}'


#格式化輸出
awk -F: '{printf "% -12s % -6s % -8s\n",$1,$2,$NF}' /etc/passwd

注：本章內容為讀書筆記，摘自《Linux命令列與shell腳本編程大全》第3版，

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/221211.html

標籤：java

上一篇：為什么最近的食鹽用量增加了？

下一篇：ElasticSearch學習筆記三