python之正則運算式-有解無憂

1,什么是正則運算式？

正則運算式（regular expression）是用來簡潔表達一組字串的運算式，

2,作用是什么？

①表達文本型別的特征， ②同時查找或替換一組字串， ③匹配字串的全部或部分，

3,常用的運算子：

運算子	說明	例子
.	表示任何單個字符
[]	字符集，對單個字符給出取值范圍	[abc]表示a,b,c,[a-z]表示a-z單個字符
[^]	非字符集，對單個字符給出排除范圍	[^abc]表示除a,b,c之外的單個字符
*	前一個字符0次或無限次擴展	abc*表示ab,abc,abcc,abccc等等
+	前一個字符1次或無限次擴展	abc+表示abc,abcc,abccc等等
？	前一個字符出現或者不出現	abc表示ab,abc
\|	左右運算式任意一個	abc\|def表示abc,def
{m}	擴展前一個字符m次	ab{4}c表示abbbbc
{m,n}	擴展前一個字符m到n次，含m,n	ab{1,2}c表示abc,abbc
^	匹配字串開頭	^abc表示abc且在字串的開頭
$	匹配字串結尾	abc$表示abc且在字串的結尾
()	分組標記，內部只能使用 \| 運算子	（abc）表示abc，（abc \| def）表示abe、def
\d	數字，等價于[0,9]
\w	單詞字符，等價于[A-Za-z0-9_]

4,正則運算式的一些語法實體

正則運算式	對應的字串
P(Y\|YT\|YTH\|YTHO)?N	"PN","PYN","PYTN","PYTHN","PYTHON"
PYTHON+	"PYTHON","PYTHONN","PYTHONNN".......
PY[TH]ON	"PYTON","PYHON"
PY[^TH]?ON	"PYON","PYAON","PYBON","PYCON"......
PY{:3}N	"PN","PYN","PYYN","PYYYN"

5,經典的正則運算式實體

^[A-Za-z]+$	由26個字母組成的字串
^[A-Za-z0-9]+$	由26個字母和數字組成的字串
^-?\d+$	整數形式的字串
^[0-9][1-9][0-9]$	正整數形式的字串
[1-9]\d{5}	中國境內的郵政編碼
[\u4e00-\u9fa5]	匹配中文字符
\d{3}-\d{8}\|\d{4}-\d{7}	國內的電話號碼，010-12345678
[1-9]?\d	0-99
1\d{2}	100-199
2[0-4]\d	200-249
25[0-5]	250-255
(([1-9]?\d\|1\d{2}\|2[0-4]\d\|25[0-5]).){3}([1-9]?\d\|1\d{2}\|2[0-4]\d\|25[0-5])	匹配ip地址

6,re庫的基本使用

re庫的主要功能函式
re.search()	在一個字串中搜索匹配正則運算式的第一個位置，回傳match物件
re.match()	從一個字串的開始位置起匹配正則運算式，回傳match物件
re.findall()	搜索字串，以串列型別回傳全部能匹配的子串
re.split()	將一個字串按照正則運算式匹配結果進行分割，回傳串列型別
re.finditer()	搜索字串，回傳一個匹配結果的迭代型別，每個迭代元素是match物件
re.sub()	在一個字串中替換所有匹配正則運算式的子串，回傳替換后的字串

①search(pattern, string, flags=0)

pattern：正則運算式的字串或原生字串表示
string：待匹配字串
flags：正則運算式使用時的控制標記

1 import re
2 match = re.search(r"[1-9]\d{5}", "haha 723300")
3 if match:
4     print(match.group())
5 
6 G:\Project1\venv\Scripts\python.exe G:/Project1/practice/lianxi2.py
7 723300
8 
9 Process finished with exit code 0

②match(pattern,string,flags=0)

需要注意的是 match 函式是從字串開始處開始查找，如果開始處不匹配，則不再繼續尋找，若找到回傳值為一個 match 物件，找不到時回傳 None

 1 import re
 2 match = re.match(r"[1-9]\d{5}", "haha 723300")
 3 print(type(match))
 4 match = re.match(r"[1-9]\d{5}", "723300 haha")
 5 if match:
 6     print(match.group())
 7 
 8 G:\Project1\venv\Scripts\python.exe G:/Project1/practice/lianxi2.py
 9 <class 'NoneType'>
10 723300
11 
12 Process finished with exit code 0

match

可見search與match的區別在于：
match要求待匹配的子串必須在字串的起始位置，否則查找不到，而search則無此要求

③findall（pattern，string，flags=0）

 1 import re
 2 c = re.findall(r"[1-9]\d{5}", "haha723300 xixi612203")
 3 print(type(c))
 4 print(c)
 5 
 6 G:\Project1\venv\Scripts\python.exe G:/Project1/practice/lianxi2.py
 7 <class 'list'>
 8 ['723300', '612203']
 9 
10 Process finished with exit code 0

findall

④split(pattern，string，maxsplit=0，flags=0)

maxsplit：最大分割數，剩余部分作為最后一個元素輸出

 1 import re
 2 a = re.split(r"[1-9]\d{5}", "haha723300 xixi612203")
 3 print(type(a))
 4 print(a)
 5 
 6 a = re.split(r"[1-9]\d{5}", "haha723300 xixi612203", maxsplit=1)
 7 print(a)
 8 
 9 str1 = "name: hpl, age: 18"
10 b = re.split(r'\:|\,', str1)
11 print(b)
12 
13 
14 G:\Project1\venv\Scripts\python.exe G:/Project1/practice/lianxi2.py
15 <class 'list'>
16 ['haha', ' xixi', '']
17 ['haha', ' xixi612203']
18 ['name', ' hpl', ' age', ' 18']
19 
20 Process finished with exit code 0

split

⑤finditer(pattern，string，flags=0)

 1 import re
 2 for m in re.finditer(r"[1-9]\d{5}", "haha723300 xixi612203"):
 3     if m:
 4         print(m.group())
 5 
 6 G:\Project1\venv\Scripts\python.exe G:/Project1/practice/lianxi2.py
 7 723300
 8 612203
 9 
10 Process finished with exit code 0

finditer

⑥sub(pattern，repl，string，count=0，flags=0)

repl：替換匹配字串的字串
count：匹配的最大替換次數

1 import re
2 m = re.sub(r"[1-9]\d{5}", "love", "haha723300 xixi612203")
3 if m:
4     print(m)
5 
6 G:\Project1\venv\Scripts\python.exe G:/Project1/practice/lianxi2.py
7 hahalove xixilove
8 
9 Process finished with exit code 0

sub

7,re庫的match物件

屬性：
string 待匹配的文本
re 匹配時使用的pattern物件（正則運算式）
pos 正則運算式搜索文本的開始位置
endpos 正則運算式搜索文本的結束位置

方法：
group() 獲得匹配后的字串
start() 匹配字串在原始字串的開始位置
end() 匹配字串在原始字串的結束位置
span() 回傳（start）…（end）

 1 import re
 2 match = re.search(r"[1-9]\d{5}", "haha723300 xixi612203")
 3 print(match.string)
 4 print(match.re)
 5 print(match.pos)
 6 print(match.endpos)
 7 print(match.group())
 8 print(match.start())
 9 print(match.end())
10 print(match.span())
11 
12 G:\Project1\venv\Scripts\python.exe G:/Project1/practice/lianxi2.py
13 haha723300 xixi612203
14 re.compile('[1-9]\\d{5}')
15 0
16 21
17 723300
18 4
19 10
20 (4, 10)
21 
22 Process finished with exit code 0

re庫的match物件

8,re庫的貪婪匹配和最小匹配

①re庫默認采用貪婪匹配，即輸出匹配最長的子串

1 import re
2 match = re.search(r'PY.*N','PYANBNCNDN')
3 print(match.group())
4 
5 G:\Project1\venv\Scripts\python.exe G:/Project1/practice/lianxi2.py
6 PYANBNCNDN
7 
8 Process finished with exit code 0

貪婪匹配

②最小匹配的方法：在擴展運算子后加？

最小匹配運算子
運算子	說明
*？	前一個字符0次或無限次擴展,最小匹配
+？	前一個字符1次或無限次擴展,最小匹配
？？	前一個字符0次或1次擴展，最小匹配
[m,n]?	擴展前一個字符m至n次(含n),最小匹配

1 import re
2 match = re.search(r'PY.*?N','PYANBNCNDN')
3 print(match.group())
4 
5 G:\Project1\venv\Scripts\python.exe G:/Project1/practice/lianxi2.py
6 PYAN
7 
8 Process finished with exit code 0

最小匹配

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/212682.html

標籤：Python

上一篇：用Python語言設計一個計算機程式來模擬“單人壁球”游戲

下一篇：Django view視圖