如何在知道用戶名在python中特定短語之后或之前的大文本中查找所有用戶名？-有解無憂

所以我有一個看起來像這樣的大文本檔案：

""" 是的，你成功了，用戶 1 ！ — 25/03/2022 --------------- 用戶 2 加入了聚會。 — 22/03/2022 -------- ------- 是的，你成功了，用戶 3 ！— 2022 年 3 月 29 日 --------------- 用戶 4 加入了聚會。— 28/03/2022"""

我如何獲得所有用戶的姓名，知道他們都在 python 的那些特定短語之后或之前？

我試過了：

import re
text =""" ....""" #text is here
before_j = re.findall(r'\bjust showed up\S*', text)
print(before_j)

uj5u.com熱心網友回復：

采用

(?<=Yay you made it, )\S |\S (?= joined the party)

請參閱如何在知道用戶名在python中特定短語之后或之前的大文本中查找所有用戶名？

但是，我會假設用戶名可能更復雜，所以讓我們假設用戶名是一個或多個非空格字符（注意，這通常是無效的 - 如果在end -- User1!? -- 在這種情況下\w會是一個更好的說明符）。在這種情況下，我們希望匹配一個用戶名，前面有“你成功了”，或者后面有“加入派對”。在這種情況下，我們有：

import re
s = "Yay you made it, User1 ! — 25/03/2022 --------------- User2 joined the party. — 22/03/2022 --------------- Yay you made it, User3 ! — 29/03/2022 --------------- User4 joined the party. — 28/03/2022"
[item[0] or item[1] for item in re.findall(r'(?<=you made it, )(\S )|(\S )(?= joined the party)', s)]
# ['User1', 'User2', 'User3', 'User4']

uj5u.com熱心網友回復：

可能的解決方案如下：

優點：“用戶”名稱可以包含除空格以外的任何字符。

import re

string = """ Yay you made it, User1 ! — 25/03/2022 --------------- User2 joined the party. — 22/03/2022 --------------- Yay you made it, User3 ! — 29/03/2022 --------------- User4 joined the party. — 28/03/2022"""

found = re.findall(r',\s(\S )\s!|-\s(\S )\sj', string, re.I)

print(list(filter(None, [item for t in found for item in t])))

印刷

['User1', 'User2', 'User3', 'User4']

正則運算式演示

感謝@cards、@David542 對正則運算式模式的寶貴意見。

uj5u.com熱心網友回復：

我為名稱設定了兩個匹配規則：

it, (name_pattern) !“它”，然后是名稱，后跟“！”
-{3,} (name_pattern)\s至少 3 個字符后跟名稱和一個空字符，其中名稱是任何以一位或多位數字結尾的字母字符序列，([a-zA-Z] \d )

模式匹配是同時完成的，需要洗掉回圈中的“空”匹配。

import re

text = """ Yay you made it, User1 ! — 25/03/2022 --------------- User2 joined the party. — 22/03/2022 --------------- Yay you made it, User3 ! — 29/03/2022 --------------- User4 joined the party. — 28/03/2022"""

# list of rules
rules = (r'it, ([a-zA-Z\d] ) !', r'-{3,} ([a-zA-Z] \d )\s')

#
regex = '|'.join(rules)

matches = [g1 if g2 == '' else g2 for g1, g2 in re.findall(regex, text)]

print(matches)

輸出

['User1', 'User2', 'User3', 'User4']

編輯為了避免過濾匹配文本的空字串，可以使用符號分組（只是帶有 id 的組）：

# symbolic grouping
rules = (r'it, (?=<g1>[a-zA-Z\d] ) !', r'-{3,} (?=<g2>[a-zA-Z] \d )\s')

regex = '|'.join(rules)

matches = [g.lastgroup for g in re.finditer(regex, text)]

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/454727.html

標籤：Python python-3.x 正则表达式

上一篇：有沒有辦法讓字串作為串列索引？

下一篇：如何讀取文本檔案并計算其中包含多少個串列？