我正在嘗試創建一個正則運算式,它可以洗掉任何以連字符開頭或結尾的單詞(不是兩者)。
word1--> 洗掉
-word2-> 洗掉
sub-word-> 保留
我的嘗試如下:
def begin_end_hyphen_removal(line):
return re.sub(r"((\s |^)(-[A-Za-z] )(\s |$))|((\s |^)([A-Za-z] -)(\s |$))","",line)
但是,當我嘗試將其應用于以下幾行時:
here are some word sub-words -word1 word2- sub-word2 word3- -word4
-word5 example
word6-
word7-
another one -word8
-word9
我再次得到與輸出相同的輸入。
uj5u.com熱心網友回復:
您可以使用
r'\b(?<!-)[A-Za-z0-9] -\B|\B-[A-Za-z0-9] \b(?!-)'
r'\b(?<!-)\w -\B|\B-\w \b(?!-)'
請參閱正則運算式演示。詳情:
\b(?<!-)\w -\B- 一個或多個沒有前面的字字符,-然后-是位于字串末尾或非字字符之前的字符|- 或者\B-\w \b(?!-)- 位于-字串開頭或非單詞字符之后的一個或多個未跟在-.
請參閱Python 演示:
import re
rx = re.compile( r' *(?:\b(?<!-)\w -\B|\B-\w \b(?!-))' )
text = 'here are -some- word sub-words -word1 word2- sub-word2 word3- -word4\n-word5 example\nword6-\nword7-\nanother one -word8\n-word9'
print( rx.sub('', text) )
輸出:
here are -some- word sub-words sub-word2
example
another one
uj5u.com熱心網友回復:
import re
pattern = r"(?=\S*['-])([a-zA-Z'-] )"
test_string = '''here are some word sub-words -word1 word2- sub-word2 word3- -word4
-word5 example
word6-
word7-
another one -word8
-word9'''
result = re.findall(pattern, test_string)
print(result)
uj5u.com熱心網友回復:
您可以重復前面或后面的匹配單詞字符 -
如果您有由連字符分隔的單詞,并且以連字符結尾,您也想洗掉,例如sugar-free-:
(?<!\S)(?:-\w (?:-\w )*|\w (?:-\w )*-)(?!\S)
在部分,模式匹配:
(?<!\S)左邊的空白邊界(?:非捕獲組-\w (?:-\w )*匹配-和單詞字符,可選擇重復-和單詞字符|或者\w (?:-\w )*-匹配可選重復的-單詞字符和單詞字符
)關閉非捕獲組(?!\S)右側的空白邊界
請參閱正則運算式演示和Python 演示。
請注意,在您嘗試的模式中,您使用了\s,但請注意,它也可以匹配換行符。
如果您不想洗掉換行符,可以使用[^\S\n]*代替\s*.
例子
import re
def begin_end_hyphen_removal(line):
return re.sub(r"\s*(?<!\S)(?:-\w (?:-\w )*|\w (?:-\w )*-)(?!\S)", "", line)
s = ("here are some word sub-words -word1 word2- sub-word2 word3- -word4\n"
"-word5 example\n"
"word6-\n"
"word7-\n"
"another one -word8\n"
"-word9")
print(begin_end_hyphen_removal(s))
輸出
here are some word sub-words sub-word2 example
another one
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/323644.html
