我有一些地址資訊。存盤在 pandas df 列中,如下所示:
df['Addr']
LT 75 CEDAR WOOD 3RD PL
LTS 22,25 & 26 MULLINS CORNER
LTS 7 & 8
PT LT 22-23 JEFFERSON HIGHLANDS EXTENSION
我想提取批次資訊并創建一個新列,因此對于上面的示例,我的預期結果如下:
df['Lot']
75
22,25 & 26
7 & 8
22-23
這是我的代碼:
df['Lot'] = df['Addr'].str.extract(r'\b(?:LOT|LT|LTS?) (\w (?:-\d )*)')
我得到的結果是:
75
22
7
22-23
如果可能的話,如何修改我的正則運算式以獲得預期的結果?請指教。
uj5u.com熱心網友回復:
你可以使用
\b(?:LOT|LTS?) (\d (?:(?:[-,]| & )\d )*)
解釋
\b一個詞的邊界(?:LOT|LTS?)匹配LOT或LT或LTS(捕獲組 1\d匹配 1 個數字(?:(?:[-,]| & )\d )*可選擇重復-或,或&后跟 1 個以上數字
)關閉組 1
正則運算式演示
data = [
"LT 75 CEDAR WOOD 3RD PL",
"LTS 22,25 & 26 MULLINS CORNER",
"LTS 7 & 8",
"PT LT 22-23 JEFFERSON HIGHLANDS EXTENSION"
]
df = pd.DataFrame(data, columns = ['Addr'])
df['Lot'] = df['Addr'].str.extract(r'\b(?:LO?T|LTS?) (\d (?:(?:[-,]| & )\d )*)')
print(df)
輸出
Addr Lot
0 LT 75 CEDAR WOOD 3RD PL 75
1 LTS 22,25 & 26 MULLINS CORNER 22,25 & 26
2 LTS 7 & 8 7 & 8
3 PT LT 22-23 JEFFERSON HIGHLANDS EXTENSION 22-23
如果- ,and&都可以被可選的空白字符包圍,您可以將模式縮短為:
\b(?:LOT|LTS?) (\d (?:\s*[-,&]\s*\d )*)\b
正則運算式演示
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/486593.html
