我有以下資料,從中我只想檢索訊息男孩部分并洗掉與“轉發”標題相關的所有資訊。
---------------------- Forwarded by Phillip K Allen/HOU/ECT on 03/21/2000
01:24 PM ---------------------------
Stephane Brodeur
03/16/2000 07:06 AM
To: Phillip K Allen/HOU/ECT@ECT
cc:
Subject: Maps
As requested by John, here's the map and the forecast...
Call me if you have any questions (403) 974-6756.
到目前為止我嘗試過的低于正則運算式。matchObjj = re.search(r'(---.*?)Subject:', tmp_text, re.DOTALL)
當我使用以下命令列印時
print( tmp_text[matchObjj.span()[1]:])
我得到低于輸出。
Maps
As requested by John, here's the map and the forecast...
Call me if you have any questions (403) 974-6756.
所以基本上問題是正則運算式沒有剝離“主題:”的完整行,只有標題主題:被洗掉,但實際的主題文本仍然存在,在這種情況下是“地圖”。我希望正則運算式檢測到主題行末尾的文本,然后將其洗掉。請分享您的想法。
uj5u.com熱心網友回復:
最簡單的方法應該是將您的正則運算式更改為:
r'(---.*?)Subject:[^\n]*\n'
這將使您的匹配一直延伸到下一個換行符,使其跨度的結束成為下一行的開始。
uj5u.com熱心網友回復:
您可以在沒有正則運算式的情況下通過創建一個句子splitlines串列并從主題行中切割此串列來執行此操作:
text = '''---------------------- Forwarded by Phillip K Allen/HOU/ECT on 03/21/2000
01:24 PM ---------------------------
Stephane Brodeur
03/16/2000 07:06 AM
To: Phillip K Allen/HOU/ECT@ECT
cc:
Subject: Maps
As requested by John, here's the map and the forecast...
Call me if you have any questions'''
data = text.splitlines()
slice_idx = [i for i, s in enumerate(data) if s.startswith('Subject: ')][0]
body = '/n'.join(data[slice_idx 2:])
輸出:
As requested by John, here's the map and the forecast...
Call me if you have any questions
uj5u.com熱心網友回復:
主題行后有更多空格,或者您的情況可能有 \t 分隔。您可以嘗試將大小寫與兩個或多個空格匹配。例如
regexEquation = "(---.*?)Subject:[^\n]*(\s) "
您可以從此處或此處獲得有關匹配更多空間的幫助。
**Output**: As requested by John, here's the map and the forecast...
Call me if you have any questions (403) 974-6756.
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/312930.html
