Python：用反斜杠替換所有內容，直到下一個空格-有解無憂

作為預處理我的資料的一部分。我希望能夠替換任何帶有斜杠的東西，直到出現帶有空字串的空間。例如：\fs24 需要替換為空或者\qc23424 需要替換為空。可能會多次出現我想洗掉的帶有斜杠的標簽。我創建了一個要根除的標簽串列，我的目標是在正則運算式中使用它來清理提取的文本

輸入字串：這是一個字串\fs24，它包含一些文本和標簽\qc23424。我想從我的字串中洗掉它。

預期輸出：這是一個字串，它包含一些文本和標簽。我想從我的字串中洗掉它。

我在 python 中使用基于正則運算式的替換函式：

udpated = re.sub(r'/\fs\d ', '')

但這并沒有獲得所需的結果。或者，我已經建立了一個根除串列并將其從一個從上到下的回圈替換，但這是一個性能殺手。

uj5u.com熱心網友回復：

假設“標簽”也可以出現在字串的最開頭，并避免選擇誤報，也許您可??以使用：

\s?(?<!\S)\\[a-z\d]

并且什么也不替換。查看在線演示。

\s? - 可選地匹配一個空白字符（如果一個標簽是中間字串，因此前面有一個空格）；
(?<!\S) - 斷言位置前面沒有非空白字符（以允許在輸入的開頭有一個位置）；
\\ - 字面反斜杠。
[a-z\d] - 1 （貪婪）字符，根據給定的類。

uj5u.com熱心網友回復：

首先，/根本不屬于正則運算式。

其次，即使您使用的是原始字串文字，\它本身對正則運算式引擎具有特殊意義，因此您仍然需要對其進行轉義。（如果沒有原始字串文字，您將需要'\\\\fs\\d '。）\beforef旨在按字面使用；在\之前d是字符類匹配位數的一部分。

最后，sub接受三個引數：模式、替換文本和執行替換的字串。

>>> re.sub(r'\\fs\d ', '', r"This is a string \fs24 and it contains...")
'This is a string  and it contains...'

uj5u.com熱心網友回復：

那對你有用嗎？

re.sub(
    r"\\\w \s*",  # a backslash followed by alphanumerics;
    '',           # replace it with an empty string;
    input_string  # in your input string
)

>>> re.sub(r"\\\w \s*", "", r"\fs24 hello there")
'hello there'
>>> re.sub(r"\\\w \s*", "", "hello there")
'hello there'
>>> re.sub(r"\\\w \s*", "", r"\fs24hello there")
'there'
>>> re.sub(r"\\\w \s*", "", r"\fs24hello \qc23424 there")
'there'

uj5u.com熱心網友回復：

'\\' 匹配 '\' 并且 'w ' 匹配一個單詞直到空格

import re
s = r"""This is a string \fs24 and it contains some texts and tags \qc23424. which I want to remove from my string."""
re.sub(r'\\\w ', '', s)

輸出：

'This is a string  and it contains some texts and tags . which I want to remove from my string.'

uj5u.com熱心網友回復：

我試過這個，對我來說效果很好：

def remover(text, state):
    
    removable = text.split("\\")[1]
    removable = removable.split(" ")[0]
    removable = "\\"   removable   " "
    text = text.replace(removable, "")
    state = True if "\\" in text else False
    return text, state


text = "hello \\I'm new here \\good luck"
state = True
while state:
    text, state = remover(text, state)
print(text)

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/396636.html

標籤：Python 正则表达式

上一篇：正則運算式條件提取電子郵件地址的域名

下一篇：Python-回傳可以更改的字串子字串