Linux 文本處理三劍客之grep-有解無憂

文本處理都要使用正則運算式，正則運算式有：

基本正則運算式：grep或者egrep -G
擴展正則運算式：egreo或者grep -E

Linux 文本處理三劍客：

sed：stream editor，流編輯工具程式，
awk：linux上是gawk，格式化文本工具程式，
grep：Global search Regular expression and print out the line
- 使用基本正則運算式的命令：
  - grep
  - egrep -G
  - fgrep -G
- 使用擴展正則運算式的命令：
  - grep -E
  - egrep
  - fgrep -E
- 不使用正則運算式的命令，速度會快得多，
  - fgrep
文本搜索工具，根據用戶指定的搜索條件，對目標文本逐行掃描，列印出匹配的所有行，

搜索條件：就是用正則運算式來表示，

一，grep使用介紹：

語法：
- grep [OPTIONS] PATTERN [FILE...]
- grep [OPTIONS][-e PATTERN | -f FILE] [FILE...]

最基本的例子：查找"UUID"，在/etc/fstab

# grep "UUID" /etc/fstab
UUID=3d3b316a-529e-484a-9895-e785fdde5365 /boot         xfs     defaults        0 0

搜索時，搜索條件的字母是區分大小寫的，讓它不區分大小寫的選項：-i

# grep "UUiD" /etc/fstab
# echo $?
1
# grep -i "UUiD" /etc/fstab
UUID=3d3b316a-529e-484a-9895-e785fdde5365 /boot         xfs     defaults        0 0

不讓它顯示匹配到的一整行，只顯示匹配但的文本內容本身：-o
```
# grep -o "UUID" /etc/fstab
UUID
```

讓它顯示沒有匹配到的行：-v

# grep -v "UUID" /etc/fstab
/dev/mapper/centos-root /                       xfs     defaults        0 0

不顯示匹配到的內容，只想知道是否匹配的結果：-q

# grep -q "UUID" /etc/fstab
# echo $?
0
# grep -q "UUIDa" /etc/fstab
# echo $?
1

使用擴展正則運算式：-E

顯示匹配到的行的行號：-n

# grep -n "UUID" /etc/fstab
10:UUID=3d3b316a-529e-484a-9895-e785fdde5365 /boot         xfs     defaults        0 0

顯示匹配到行的后面幾行：-A #，#是數字

# grep -nA1 gentoo /etc/passwd
49:gentoo:x:1004:1004::/tmp/gentoo:/bin/bash
50-fedora:x:1005:1005::/tmp/fedora:/bin/bash

顯示匹配到行的前面幾行：-B #，#是數字

# grep -nB2 gentoo /etc/passwd
47-za2:x:1002:1003::/home/za2:/bin/bash
48-mysql:x:1003:979::/home/mysql:/sbin/nologin
49:gentoo:x:1004:1004::/tmp/gentoo:/bin/bas

顯示匹配到行的前面幾行和后面幾行：-C #，#是數字

# grep -nC1 gentoo /etc/passwd
48-mysql:x:1003:979::/home/mysql:/sbin/nologin
49:gentoo:x:1004:1004::/tmp/gentoo:/bin/bash
50-fedora:x:1005:1005::/tmp/fedora:/bin/bash

字符匹配
- .：匹配任意單個字符
```
# grep -n "f..ora" /etc/passwd
50:fedora:x:1005:1005::/tmp/fedora:/bin/bash
# grep "f.ora" /etc/passwd
#
```
- []：匹配指定范圍內的任意單個字符，中間不用逗號分隔
- [^]：匹配指定范圍外的任意單個字符
  
  [:digit:]，[:lower:]，[:upper:]，[:alpha:]，[:alnum:]，[:punct:]，[:space:]
  
  例子：匹配r和t之間，是2個字母的行，
```
# grep "r[[:alpha:]][[:alpha:]]t" /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
```
匹配次數：默認是貪婪模式，匹配到后，還會一直繼續匹配下去，直到匹配不到了才停，

下面的例子匹配"x*y"，【xxxxy】里有很多x，貪婪模式就把所有x都匹配了，而不是匹配的【xy】，
- 【*】：匹配其前面的字符任意次，0次也包括，
  
  注意下面方括號里的是被匹配到的，
```
# cat t1
abxy
aby
xxxxy
yab
asdf
# grep "x*y" t1
ab[xy]
ab[y]
[xxxxy]
[y]ab
```
  匹配r本身和r之后面的所有字符，
```
# grep "r.*" /etc/passwd
```
- 【\?】:匹配其前面的字符0次或者1次，
- 【\+】:匹配其前面的字符1次或者多次，
- 【\{m\}】:匹配其前面的字符m次，
- 【\{m,n\}】:匹配其前面的字符至少m次，至多n次，
- 【\{m,\}】:匹配其前面的字符至少m次.
- 【\{0,n\}】:匹配其前面的字符至多n次，
  
  注意下面方括號里的是被匹配到的，
```
# cat t1
abxy
aby
xxxxy
yab
asdf
# grep "x\?y" t1
ab[xy]
ab[y]
xxx[xy]
[y]ab
# grep "x\+y" t1
ab[xy]
[xxxxy]
# grep "x\{1\}y" t1
ab[xy]
xxx[xy]
# grep "x\{2\}y" t1
xx[xxy]
# grep "x\{2,3\}y" t1
x[xxxy]
# grep "x\{1,2\}y" t1
ab[xy]
xx[xxy]
# grep "x\{1,\}y" t1
ab[xy]
[xxxxy]
[root@localhost tmp]# grep "x\{,2\}y" t1
ab[xy]
ab[y]
xx[xxy]
[y]ab
```

位置錨定

【^】：行首錨定
【$】：行尾錨定
【^PATTERN$】：用PATTERN匹配整行，
【^$】：什么都不能有的空行，
【^[1]\+$】：包含空白字符的行，
單詞：非特殊字符組成的連續字符都稱為單詞，
【\<或\b】:單詞首錨定，用于單詞模式的左側
【\>或\b】:單詞尾錨定，用于單詞模式的右側
【\<單詞\>】:匹配完整單詞，

# grep root /etc/passwd
[root]:x:0:0:[root]:/[root]:/bin/bash
operator:x:11:0:operator:/[root]:/sbin/nologin
[root]kit:x:1006:1006::/home/[root]kit:/bin/bash
user4:x:1007:1007::/home/user4:/bin/ch[root]
ch[root]er:x:1008:1008::/home/ch[root]er:/bin/bash
# grep "^root" /etc/passwd
[root]:x:0:0:root:/root:/bin/bash
[root]kit:x:1006:1006::/home/rootkit:/bin/bash
# grep "root$" /etc/passwd
user4:x:1007:1007::/home/user4:/bin/ch[root]
# grep "^root$" /etc/passwd
# echo $?
1
# cat t1
abxy
aby

xxxxy

yab
asdf
a
# grep -n "^$" t1
3:
# grep -n "^[[:space:]]*$" t1
3:
5:
# grep -n "^[[:space:]]\+$" t1
5:
# grep "\<root" /etc/passwd
[root]:x:0:0:[root]:/[root]:/bin/bash
operator:x:11:0:operator:/[root]:/sbin/nologin
[root]kit:x:1006:1006::/home/[root]kit:/bin/bash
# grep "root\>" /etc/passwd
[root]:x:0:0:[root]:/[root]:/bin/bash
operator:x:11:0:operator:/[root]:/sbin/nologin
user4:x:1007:1007::/home/user4:/bin/ch[root]
# grep "\<root\>" /etc/passwd
[root]:x:0:0:[root]:/[root]:/bin/bash
operator:x:11:0:operator:/[root]:/sbin/nologin

練習1：顯示/etc/passwd檔案中不以/bin/bash結尾的行

# grep -nv "/bin/bash$" /etc/passwd

練習2：找出/etc/passwd檔案中2位數或3位數的單詞，

# grep -n "\<[[:digit:]]\{2,3\}\>" /etc/passwd

練習3：找出/etc/grub2.cfg檔案中，以至少一個空白字符開頭，且后面非空白字符的行，

# grep -n "^[[:space:]]\{1,\}[^[:space:]]" /etc/grub2.cfg

練習4：找出"netstat -tan"命令結果中以"LISTEN"后跟0個，1個或多個空白字符結尾的行

# netstat -tan | grep -n "LISTEN[[:space:]]*"

分組及參考
- 分組【】：將一個或多個字符用括號捆綁在一起，當作一個整體去匹配，
- 參考：被匹配到的分組，會保存在特殊的變數里，在后面可以參考它們，
  - \1：第一個被匹配到的分組
  - \2：第二個被匹配到的分組
  - \#：第#個被匹配到的分組
  練習：匹配一個分組，且后面有一個同樣的串，
```
# cat t2
He likes his lover.
He loves his lover.
She likes her liker.
She loves her liker.
# grep "l..e.*l..e" t2
He [likes his love]r.
He [loves his love]r.
She [likes her like]r.
She [loves her like]r.
# grep "$l..e$.*\1" t2
He [loves his love]r.
She [likes her like]r.
```

二，egrep使用介紹：

grep里的選項的用法在egrep里也適用，

字符匹配：和grep相同
次數匹配
- ?：匹配其前面的字符0次或者1次，
- +：匹配其前面的字符1次或者多次，
- {m}：匹配其前面的字符m次，
- {m,n}：匹配其前面的字符至少m次，至多n次，
- {m,}：匹配其前面的字符至少m次.
- {0,n}：匹配其前面的字符至多n次，
位置錨定：和grep相同
分組及參考
- 分組()：將一個或多個字符用括號捆綁在一起，當作一個整體去匹配，
- 參考：和grep相同
或
- a|b：a或者b
- C|cat：不是Cat或者cat，是C或者cat
- (C|c)at：cat或者Cat
練習1：找出/proc/meminfo檔案中，所有在大寫或小寫S開頭的行，用3種方法實作，
```
# egrep "^(s|S)" /proc/meminfo
```
```
 # grep -ni "^s" /proc/meminfo
```
```
# grep  "^[sS]" /proc/meminfo
```
練習2：找出/etc/passwd檔案中2位數或3位數的單詞，
```
# egrep -n "\<[[:digit:]]{2,3}\>" /etc/passwd
```
練習3：找出/etc/grub2.cfg檔案中，以至少一個空白字符開頭，且后面非空白字符的行，
```
# egrep -n "^[[:space:]]{1,}[^[:space:]]" /etc/grub2.cfg
```
練習4：找出/etc/rc.d/init.d/functions檔案中某單詞后面跟一個小括號的行，
```
# grep "\<.*\>[[:space:]]*()" /etc/rc.d/init.d/functions
```
練習5：使用echo命令輸出一個絕對路徑，使用egrep取出基名，
```
# echo /etc/rc.d/init.d/functions | grep -o "^/.*/"
/etc/rc.d/init.d/
# echo /etc/rc.d/init.d/functions | egrep -o "[^/]+$"
functions
```
練習6：找出ifconfig命令結果中1-255之間的數值，
```
# ifconfig | grep -E "\<[1-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]\>"
```
練習7：找出ifconfig命令結果中IP地址，
```
# ifconfig | egrep -n "\<[0-9]+\>.\<[0-9]+\>.\<[0-9]+\>.\<[0-9]+\>"
```
練習8：找出用戶名和shell名相同的用戶，
```
# egrep  "^([^:]+\>).*\1$" /etc/passwd
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
```

三，fgrep使用介紹

當無需使用到正則運算式時，使用fgrep性能更好，

四，文本查看及處理工具

1，wc：統計行數，單詞數，位元組數，字符數

-l：行數
-w：單詞數
-c：位元組數
-m：字符數

# wc /etc/fstab
 12  60 541 /etc/fstab
# wc -l /etc/fstab
12 /etc/fstab
# wc -w /etc/fstab
60 /etc/fstab
# wc -c /etc/fstab
541 /etc/fstab
# wc -m /etc/fstab
541 /etc/fstab

2，remove sections（列） from each line of files

linux下的文本，也是有格式的，所謂的格式，就是有可識別的分隔標識，用分隔標識，就可以把文本內容，切分成列，

比如，/etc/passwdwen檔案里的內容就是用冒號分隔的，

語法：cut OPTION... [FILE]...
指定冒號為分隔符：-d:

只能指定單一分隔符，
留下哪些列：-f1-3,5,7

# cut -d: -f1-3,5,7 /etc/passwd
rootkit:x:1006::/bin/bash
user4:x:1007::/bin/chroot
# wc -l /etc/rc.d/init.d/functions
712 /etc/rc.d/init.d/functions
# wc -l /etc/rc.d/init.d/functions | cut -d' ' -f1
712

3，按文本的某一列排序：sort，

把文本用指定的分隔符切分成列，然后用特定的列排序行，類似微軟的excel的按列排序功能，

語法：sort [OPTION]... [FILE]...
指定分隔符：-t
指定用于排序的列的號碼：-k
基于數值大小而非字符進行排序：-n
逆序排序：-r
忽略字符大小寫：-f
連續，并重復的行只保留一份：-u

用：分隔，按第3列的數字大小比較，降序排序，

# sort -t: -k3 -nr /etc/passwd

用：分隔，用第7列基于字母比較，升序排序，并去掉重復的行，

# sort -t: -k7 -u /etc/passwd

4，洗掉重復的行：uniq

使用的前提：必須先sort

語法：uniq [OPTION]... [INPUT [OUTPUT]]
顯示重復的次數：-c
僅顯示未曾重復過的行：-u
僅顯示重復過的行：-d

檢查shell的使用情況，

# cut -d: -f7 /etc/passwd | sort |uniq -c
      7 /bin/bash
      1 /bin/chroot
      1 /bin/csh
      1 /bin/false
      1 /bin/sync
      1 /sbin/halt
     40 /sbin/nologin
      1 /sbin/shutdown
# cut -d: -f7 /etc/passwd | sort |uniq -u
/bin/chroot
/bin/csh
/bin/false
/bin/sync
/sbin/halt
/sbin/shutdown
# cut -d: -f7 /etc/passwd | sort |uniq -d
/bin/bash
/sbin/nologin

5，逐行比較檔案，可以比較多個檔案，可以按目錄比較

語法：diff [OPTION]... FILES
用重定向生成一個差分的檔案，

# diff t1 t2
# diff t1 t2 > patch1

6，根據diff產生的差分檔案，給源檔案打補丁：patch

修改舊的檔案，讓舊的檔案升級(打補丁)，-i后面的檔案是用diff輸出重定向生成的檔案，

# patch -i patch1 t1

補丁打錯了，恢復到舊的檔案：-R

# patch -R -i patch1 t1

練習：取出某個網卡的ip地址，

# ifconfig enp0s3
enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.247.236.19  netmask 255.255.254.0  broadcast 10.247.237.255
        inet6 fe80::b497:5ec:1efb:72b5  prefixlen 64  scopeid 0x20<link>
        ether 08:00:27:10:c2:53  txqueuelen 1000  (Ethernet)
        RX packets 32057  bytes 5882570 (5.6 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 5324  bytes 1032770 (1008.5 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
# ifconfig enp0s3 | grep "\<inet\>" | cut -d' '  -f10
10.247.236.19

# c/c++ 學習互助QQ群：877684253 ![](https://img2018.cnblogs.com/blog/1414315/201811/1414315-20181106214320230-961379709.jpg) # 本人微信：xiaoshitou5854

[:space:] ??

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/145036.html

標籤：Linux

上一篇：zabbix自定義監控redis

下一篇：Linux 學習筆記 1 使用最小的系統，從磁區安裝系統開始