替換除數字之間或后跟特定文本之外的所有點-有解無憂

我想用 Python 替換所有點，除了數字之間的點或后面的特定文本與 \n 。

輸入：我在 8.30 開會。我將在 meet.com。再見。

輸出：我在 8.30 有一個會議\n我將在 meet.com \n再見\n

這是我的嘗試代碼：

def replace_dot_for_original_sentence(text):
  dot = "."
  for char in text:
    if char in dot:
        if not re.match(r'(?<=\d)\.(?=\d)', text) or not re.match(r'\.(com|org|net|co|id)', text):
            text = text.replace(char, "\n")
  return text

它所做的是將所有點替換為 \n。我也嘗試過使用 re.search，我認為 if 條件有問題？有任何想法嗎？

uj5u.com熱心網友回復：

我們可以re.sub在這里嘗試使用：

inp = "I have a meeting at 8.30. I will be at meet.com. Bye."
output = re.sub(r'\.(?:\s |$)', ' \n ', inp)
print(output)  # I have a meeting at 8.30 \n I will be at meet.com \n Bye \n

uj5u.com熱心網友回復：

試試這個模式：

import re


def replace_dot_for_original_sentence(text):
    text = re.sub(r'\.\s |\.$', '\n', text)
    return text

print(replace_dot_for_original_sentence('I have a meeting at 8.30. I will be at meet.com. Bye.'))

輸出

I have a meeting at 8.30
I will be at meet.com
Bye

uj5u.com熱心網友回復：

re.match(r'(?<=\d)\.(?=\d)', text)is not Noneif textin its entirety is match for a period with numbers before or after it (and nothing else). 不是。從來都不是這樣，所以它總是None而且not re.match(r'(?<=\d)\.(?=\d)', text)總是True。

相似，re.match(r'\.(com|org|net|co|id)', text)總是True，除非text只是類似于.com。

然后，您繼續閱讀text = text.replace(char, "\n")整個文本 - 因此，即使您的條件有效，如果條件正確地決定需要替換某些東西，這仍然會替換其中的許多內容。

如果您希望每個期間都沒有緊隨其后com|org|net|co|id并且也沒有緊隨其后\d（因為您確實想替換8.30之后的期間，a 可能也不想替換.in 之類的東西'$.30'），這有效：

def replace_dot_for_original_sentence(text):
    return re.sub(r"(?s)\.(?!\d)(?!com|org|net|co|id)", "\n", text)

整個 for 回圈沒有做任何事情，它只是確保您的代碼僅在字串中有句點時以非常迂回的方式運行。

請注意，運算式仍然需要一些作業。例如，因為您co在那里，.coupons,.cooking和.courses（僅舉幾例）現在也被匹配和跳過。雖然像這樣的東西.co.uk仍然在中間被切斷。

如果它適用于您的資料集，那就太好了。但不要將其視為檢測 URL 結尾的一種不錯的方法。

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/429555.html

標籤：Python 正则表达式文本文本处理

上一篇：elasticsearch中的優先欄位OR查詢

下一篇：Ronin錢包地址的正則運算式