從字串中提取名稱-有解無憂

我有一個字串：

s="(2021-07-29 01:00:00 AM BST)  
---  
peter.j.matthew has joined the conversation  
  
  

(2021-07-29 01:00:00 AM BST)  
---  
john cheung has joined the conversation  
  
  


(2021-07-29 01:11:19 AM BST)  
---  
allen.p.jonas  
Hi, james  
  
  
(2021-07-30 12:51:16 AM BST)  
---  
karren wenda  
how're you ? 
  
  
  
---  
  
* * *"

我想將名稱提取為：

names_list= ['allen.p.jonas','karren Wenda']

我嘗試過的：

names_list=re.findall(r'--- [\S\n](\D [\S\n])',s)

uj5u.com熱心網友回復：

此答案假定您要查找行不以文本結尾的名稱has joined the conversation：

names = re.findall(r'\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} [AP]M [A-Z]{3}\)\s ---\s \r?\n((?:(?!\bhas joined the conversation).) ?)[ ]*\r?\n', s)
print(names)  # ['allen.p.jonas', 'karren wenda']

正則運算式的突出部分是這樣的：

((?:(?!\bhas joined the conversation).) ?)[ ]*\r?\n

這has joined the conversation通過使用緩和的點技巧來捕獲不匹配的名稱。它一次匹配包含名稱的行上的一個字符，確保conversation文本不會出現在任何地方，直到到達行尾的 CR?LF。

uj5u.com熱心網友回復：

如果您只想 match ['allen.p.jonas','karren wenda']，您可以在下一行的后面使用 match 一個非空白字符：

^---[^\S\n]*\n(\S.*?)[^\S\r\n]*\n\S

模式匹配：

^ 字串的開始
--- 比賽 ---
[^\S\n]*\n 匹配可選空格和換行符
(\S.*?)捕獲組 1（由 re.findall 回傳）匹配非空白字符，后跟盡可能少的字符
[^\S\r\n]* 匹配沒有換行符的可選空白字符
\n\S 匹配換行符和非空白字符

正則運算式演示| Python 演示

例如

print(re.findall(r"^---[^\S\n]*\n(\S.*?)[^\S\r\n]*\n\S", s, re.M))

輸出

['allen.p.jonas', 'karren wenda']

要明確排除包含的行，has joined the conversation您可以使用負前瞻：

^---[^\S\n]*\n(?!.*\bhas joined the conversation\b)(\S.*?)[^\S\r]*$

正則運算式演示| Python 演示

例如：

print(re.findall(r"^---[^\S\n]*\n(?!.*\bhas joined the conversation\b)(\S.*?)[^\S\r]*$", s, re.M))

輸出

['allen.p.jonas', 'karren wenda']

uj5u.com熱心網友回復：

假設你想匹配后面沒有“已加入對話”的名字：

name_pattern = re.compile(r'---\s*\n(\w(?:[\w\. ](?!has joined the conversation))*?)\s*\n', re.MULTILINE)
print(re.findall(name_pattern, s))

解釋：

---\s*\n 匹配可能后跟空格和所需的新行的破折號
然后是我們的匹配組，包括：
- \w 以“單詞”字符（aZ、0-9 或 _）開頭
- (?:[\w\. ](?!has joined the conversation))*?非捕獲組重復\w，.或空白不后跟“已加入對話”。捕獲一直持續到下一個空格或新行。（*?使運算式變得懶惰而不是貪婪）

輸出：

['allen.p.jonas', 'karren wenda']

轉載請註明出處，本文鏈接：https://www.uj5u.com/qianduan/353561.html

標籤：蟒蛇-3.x 正则表达式细绳

上一篇：如何在pythonpandas資料幀系列中找到unqiue檔案擴展名？

下一篇：根據條件從字典串列中洗掉嵌套字典？