python基于正則爬蟲-小筆記-有解無憂

一、re.match()，從字串的起始位置開始匹配，比如hello，匹配模式第一個字符必須為 h

1、re.match()，模式'^hello.*Demo$'，匹配字串符合正則的所有內容

import re

content= "hello 123 4567 World_This is a regex Demo"
result = re.match('^hello.*Demo$',content)
print(result.group())

2、()、group(1)，匹配字串中的某個字串，匹配數字 (\d+)

group()匹配全部，group(1)匹配第一個()

import re
content= "hello 123 4567 World_This is a regex Demo"
result = re.match('^hello\s(\d+)\s(\d+)\sWorld.*Demo$',content)
print(result.group(2))

3、\s只能匹配一個空格，若有多個空格呢，hello 123，用 \s+ 即可

4、匹配空格、或任意字串，.*，為貪婪模式，會影響后面的匹配，比如 .*(\d+)，因此用 .*? 代替\s+

4.1 貪婪模式

import re
content= "hello 123 4567 World_This is a regex Demo"
result = re.match('^hello.*(\d+)\s(\d+)\sWorld.*Demo$',content)
print(result.group(1))

輸出 3

4.2 非貪婪模式

import re
content= "hello 123 4567 World_This is a regex Demo"
result = re.match('^hello.*?(\d+).*?(\d+)\sWorld.*Demo$',content)
print(result.group(1))

輸出123

5、匹配 123 4567，(.*?)

import re
content= "hello 123 4567 World_This is a regex Demo"
result = re.match('^hello\s+(.*?)\s+World.*Demo$',content)
print(result.group(1))

輸出 123 4567

當匹配特殊字符時，用轉義，$5.00，轉為后 \$5\.00

二、re.search()，掃描整個字串，比如hello，匹配模式第一個不一定必須為 h，可以是 e

網上其它文章寫的比較混亂，沒有寫出re.match與re.search之間的區別，只是寫了一個re.search使用案例，無法讓新手朋友深入理解各個模式之間的區別

1、這里在用前面的案例，匹配 “123 4567”

import re
content= "hello 123 4567 World_This is a regex Demo"
result = re.search('ello\s+(.*?)\s+World.*Demo$',content) #從ello開始，re.match()必須從 h 開始
print(result.group(1))

輸出 123 4567