用于處理模式的正則運算式（許多發生）但僅直到子字串-有解無憂

我使用一個正則運算式來收集長文本檔案（多行）中的所有名稱：

regex = 'Name:\s*(.*)$'
names = re.findall(regex, file_content)

該檔案包含幾個部分，我只需要收集最多特定子字串的名稱（例如，“computers:”）。使用 Python 可以做到這一點（例如，file_content在子字串之后剪切），但出于某種原因，我必須只使用正則運算式。

如何？

文本檔案示例：

Name:     Jon
  address: 1st 
  phone: 01321231231231
Name:     Mon
  address: 1st 
  phone: 01321231231231
Name:     Gon
  address: 1st 
  phone: 01321231231231

Computers:

Name:     Jason
  address: 1st 
  phone: 01321231231231
Name:     Bason
  address: 1st 
  phone: 01321231231231

輸出：Jon、Mon、Gon

uj5u.com熱心網友回復：

您可以使用

regex = 'Name:\s*(.*)(?=[\s\S]*computers:)'

這里，

Name: - 一個固定的字串
\s* - 零個或多個空格
(.*) - 第 1 組：盡可能多的除換行符以外的零個或多個字符
(?=[\s\S]*computers:)- 緊靠右側，必須有零個或多個字符后跟computers:字串

uj5u.com熱心網友回復：

import re

file_content = """
Name:     Jon
  address: 1st 
  phone: 01321231231231
Name:     Mon
  address: 1st 
  phone: 01321231231231
Name:     Gon
  address: 1st 
  phone: 01321231231231

Computers:

Name:     Jason
  address: 1st 
  phone: 01321231231231
Name:     Bason
  address: 1st 
  phone: 01321231231231
"""

#names = re.findall(r'Name:.*\n', file_content)

# To match only till some specific string in that case you
# can do slicing and use your portion of interest.
names = re.findall(r'Name:.*\n', file_content[:file_content.index("Computers")])

final_name_list = []

for name in names:
    final_name_list.append(name.replace("Name:     ", "").replace("\n", ""))

print(final_name_list)

在 re.findall 中，匹配以“Name:”開頭并以新行結尾的行。

names = re.findall(r'Name:.*\n', file_content) #this matches to all
names = re.findall(r'Name:.*\n', file_content[:file_content.index("Computers")]) #this matches only till the "Compters"

串列名稱將包含帶有您需要的字串的行，您可以通過遍歷每個串列元素將其替換為空字串。

final_name_list.append(name.replace("Name:     ", "").replace("\n", ""))

轉載請註明出處，本文鏈接：https://www.uj5u.com/qianduan/385039.html

標籤：Python 正则表达式

上一篇：正則運算式（不允許在文本之間使用點）

下一篇：GoogleCloudPlatform和PCF上的Log4j漏洞(CVE-2021-44228)