正則運算式匹配單詞結尾或以連字符開頭-有解無憂

我正在嘗試創建一個正則運算式，它可以洗掉任何以連字符開頭或結尾的單詞（不是兩者）。

word1--> 洗掉 -word2-> 洗掉 sub-word-> 保留

我的嘗試如下：

def begin_end_hyphen_removal(line):
    return re.sub(r"((\s |^)(-[A-Za-z] )(\s |$))|((\s |^)([A-Za-z] -)(\s |$))","",line)

但是，當我嘗試將其應用于以下幾行時：

here are some word sub-words -word1 word2- sub-word2 word3- -word4
-word5 example
word6-
word7-
another one -word8
-word9

我再次得到與輸出相同的輸入。

uj5u.com熱心網友回復：

您可以使用

r'\b(?<!-)[A-Za-z0-9] -\B|\B-[A-Za-z0-9] \b(?!-)'
r'\b(?<!-)\w -\B|\B-\w \b(?!-)'

請參閱正則運算式演示。詳情：

\b(?<!-)\w -\B- 一個或多個沒有前面的字字符，-然后-是位于字串末尾或非字字符之前的字符
| - 或者
\B-\w \b(?!-)- 位于-字串開頭或非單詞字符之后的一個或多個未跟在-.

請參閱Python 演示：

import re
rx = re.compile( r' *(?:\b(?<!-)\w -\B|\B-\w \b(?!-))' )
text = 'here are -some- word sub-words -word1 word2- sub-word2 word3- -word4\n-word5 example\nword6-\nword7-\nanother one -word8\n-word9'
print( rx.sub('', text) )

輸出：

here are -some- word sub-words sub-word2
 example


another one

uj5u.com熱心網友回復：

import re

pattern = r"(?=\S*['-])([a-zA-Z'-] )"
test_string = '''here are some word sub-words -word1 word2- sub-word2 word3- -word4
-word5 example
word6-
word7-
another one -word8
-word9'''
result = re.findall(pattern, test_string)
print(result)

uj5u.com熱心網友回復：

您可以重復前面或后面的匹配單詞字符 -

如果您有由連字符分隔的單詞，并且以連字符結尾，您也想洗掉，例如sugar-free-：

(?<!\S)(?:-\w (?:-\w )*|\w (?:-\w )*-)(?!\S)

在部分，模式匹配：

(?<!\S) 左邊的空白邊界
(?: 非捕獲組
- -\w (?:-\w )*匹配-和單詞字符，可選擇重復-和單詞字符
- | 或者
- \w (?:-\w )*-匹配可選重復的-單詞字符和單詞字符
) 關閉非捕獲組
(?!\S) 右側的空白邊界

請參閱正則運算式演示和Python 演示。

請注意，在您嘗試的模式中，您使用了\s，但請注意，它也可以匹配換行符。

如果您不想洗掉換行符，可以使用[^\S\n]*代替\s*.

例子

import re

def begin_end_hyphen_removal(line):
    return re.sub(r"\s*(?<!\S)(?:-\w (?:-\w )*|\w (?:-\w )*-)(?!\S)", "", line)


s = ("here are some word sub-words -word1 word2- sub-word2 word3- -word4\n"
     "-word5 example\n"
     "word6-\n"
     "word7-\n"
     "another one -word8\n"
     "-word9")
print(begin_end_hyphen_removal(s))

輸出

here are some word sub-words sub-word2 example
another one

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/323644.html

標籤：蟒蛇-3.x 正则表达式

上一篇：使用Spacy正則運算式的意外結果

下一篇：在Python中抓取包含某些字符和名稱的文本？