我只想保留txt檔案中某個字串之前的行-有解無憂

我想要具有字串'VarList'的行之前的所有行。我不明白為什么其他地方提出的解決方案不適用于我的 txt 檔案。

簡化：

我有許多看起來像這樣的 .txt 檔案：

    text1=text
    text2=text
    (...)
    textN=text
    VarList=text
    (...)
    End

我只想要這個：

    text1=text
    text2=text
    (...)
    textN=text

如何為目錄路徑中的所有 txt 檔案獲取它？

首先我試過這個：

import os

for subdir, dirs, files in os.walk('C:\\Users\\nigel\\OneDrive\\Documents\\LAB\\lean\\.txt'):
    for file in files:
        output=[]
        with open(file, 'r') as inF:
            for line in inF:
                output.append(line)
                if 'VarList' in line: break
        f=open(file, 'w')
        blank=['']
        [f.write(x) for x in output]
        [f.write(x '\n') for x in blank]
        f.close()

txt 檔案中沒有任何變化，但該檔案在其中一行中有字串“VarList”。那么，為什么它不起作用？

然后：

import re

def trim(test_string, removal_string):
    return re.sub(r'^(.*?)('  removal_string   ')(.*)$', r'\1'   r'\2', test_string)

def cleanFile(file_path, removal_string):
    with open(file_path) as master_text:
        return trim(master_text, removal_string)

cleanFile(r'C:\Users\nigel\OneDrive\Documents\LAB\lean\sample_01.02_R00.txt', 'VarList')

我得到這個錯誤：

--------------------------------------------------------------------------- TypeError                                 Traceback (most recent call last) Input In [2], in <cell line: 16>()
     13     with open(file_path) as master_text:
     14         return trim(master_text, removal_string)
---> 16 cleanFile(r'C:\Users\nigel\OneDrive\Documents\LAB\lean\sample_01.02_R00.txt', 'VarList')

Input In [2], in cleanFile(file_path, removal_string)
     12 def cleanFile(file_path, removal_string):
     13     with open(file_path) as master_text:
---> 14         return trim(master_text, removal_string)

Input In [2], in trim(test_string, removal_string)
      9 def trim(test_string, removal_string):
---> 10     return re.sub(r'^(.*?)('  removal_string   ')(.*)$', r'\1'   r'\2', test_string)

File ~\Anaconda3\lib\re.py:210, in sub(pattern, repl, string, count, flags)
    203 def sub(pattern, repl, string, count=0, flags=0):
    204     """Return the string obtained by replacing the leftmost
    205     non-overlapping occurrences of the pattern in string by the
    206     replacement repl.  repl can be either a string or a callable;
    207     if a string, backslash escapes in it are processed.  If it is
    208     a callable, it's passed the Match object and must return
    209     a replacement string to be used."""
--> 210     return _compile(pattern, flags).sub(repl, string, count)

TypeError: expected string or bytes-like object

最后，我嘗試過：

with open(r'C:\Users\nigel\OneDrive\Documents\LAB\lean\sample_01.02_R00.txt', 'r') as importFile, open(r'C:\Users\nigel\OneDrive\Documents\LAB\lean\sample_01.02_R00_temp.txt', 'w') as exportFile:
    head, sep, tail = importFile.partition('VarList')
    exportFile = head

importFile.close()
exportFile.close()

錯誤：

-------------------------------------------------- ------------------------- AttributeError Traceback (最近一次呼叫最后一次) Input In [2], in <cell line: 3>() 1 #解決方案 3 3 使用 open(r'C:\Users\nigel\OneDrive\Documents\LAB\lean\sample_01.02_R00.txt', 'r') 作為 importFile, open(r'C:\Users\nigel\OneDrive\ Documents\LAB\lean\sample_01.02_R00_temp.txt', 'w') as exportFile: ----> 4 head, sep, tail = importFile.partition('VarList') 5 exportFile = head 7 importFile.close()

AttributeError：“_io.TextIOWrapper”物件沒有屬性“磁區”

有沒有人知道這里發生了什么？

uj5u.com熱心網友回復：

我認為使用 Python 的pathlib可以使這項任務變得更容易，因為它有一些用于讀取和寫入文本檔案的有用方法。

pathlib 還具有glob允許添加“**”以表示“此目錄和所有子目錄，遞回地”的功能。

為了截斷檔案，我選擇使用 Python 的串列推導來查找以所需字串開頭的行，然后在該點對行串列進行切片。

例如：

from pathlib import Path


def trim_files(dirname: Path, end_before: str) -> None:
    for file in dirname.glob("**/*.txt"):
        content = file.read_text().splitlines()
        location = [content.index(line)
                    for line in content if end_before in line]
        if location:
            file.write_text("\n".join(content[:location[0]]))


if __name__ == '__main__':
    search_directory = Path.home().joinpath('Documents', 'LAB')
    trim_files(search_directory, 'VarList')

uj5u.com熱心網友回復：

在檢查“VarList”之前，您將附加到輸出。正確的方法是：

with open(file, 'r') as inF:
    for line in inF:      
        if 'VarList' in line:
            break
        output.append(line)

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/516519.html

標籤：Python正则表达式细绳分割文本

上一篇：用傳遞給方法的引數替換字串的多個部分

下一篇：正則運算式檢查分隔符之前或之后是否存在字符序列