我想在我的正則運算式查詢中匹配特定單詞之前和之后包含 5 個字符。這些詞在一個串列中,我遍歷它。
請參見下面的示例,這是我嘗試過的:
import re
text = "This is an example of quality and this is true."
words = ['example', 'quality']
words_around = []
for word in words:
neighbors = re.findall(fr'(.{0,5}{word}.{0,5})', str(text))
words_around.append(neighbors)
print(words_around)
輸出為空。我希望包含一個陣列['s an exmaple of q', 'e of quality and ']
uj5u.com熱心網友回復:
您可以在此處使用 PyPi 正則運算式,它允許無限長的后視模式:
import regex
import pandas as pd
words = ['example', 'quality']
df = pd.DataFrame({'col':[
"This is an example of quality and this is true.",
"No matches."
]})
rx = regex.compile(fr'(?<=(.{{0,5}}))({"|".join(words)})(?=(.{{0,5}}))')
def extract_regex(s):
return ["".join(x) for x in rx.findall(s)]
df['col2'] = df['col'].apply(extract_regex)
輸出:
>>> df
col col2
0 This is an example of quality and this is true. [s an example of q, e of quality and ]
1 No matches. []
模式及其使用方式都很重要。
該fr'(?<=(.{{0,5}}))({"|".join(words)})(?=(.{{0,5}}))'部分定義了正則運算式模式。這是一個“原始”的 f 字串文字,f可以在字串文字中使用變數,但它也需要將其中的所有文字大括號加倍。該模式 - 給定當前words串列 - 看起來像(?<=(.{0,5}))(example|quality)(?=(.{0,5})),請在線查看其演示。它在正向前瞻之前捕獲 0-5 個字符words,然后捕獲words,然后在正向前瞻中捕獲下一個 0-5 個字符(環視用于確保找到任何重疊的匹配項)。
該["".join(x) for x in rx.findall(s)]部分將每個匹配項的組連接成一個字串,并作為結果回傳匹配項串列。
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/355476.html
上一篇:將選擇行映射到新列
