檢測文本檔案中特定字串模式之間的行數，并根據python中的輸出進行分類-有解無憂

我有一個文本檔案，我試圖根據單詞'START'和'END /'的行之間的行數進行分類。I/p 檔案結構：

  START               
  Action1
  Action2 
  Action3
  END /

  START
  Action1 
  END /

  START                  
  Action1
  Action2
  END /

  START  
  Action0              
  Action1
  Action2 
  Action3
  END /

  START
  Action1 
  END /

代碼應檢測 'START' 和 'END /' 之間的行數并按以下方式分類： if only 1 action line then 'P1' ; 如果有多個動作線，則為“P2”

因此，所描繪的 i/p 檔案的輸出可以如下給出：

['P2', 'P1', 'P2', 'P2', 'P1']

最終目標是將此輸出串列匯出到 excel 列中（如圖所示）。我相信這可以在 pandas 庫的幫助下完成，但是，任何相同的建議都將不勝感激。

Category
P2
P1
P2
P2
P1

最初我能夠列印出整個檔案的相應行號，所以我也在考慮提取行號。但是，由于 Actions 行的數量不同，因此想法存在重大缺陷。

with open('filepath.txt') as f:
    for index, line in enumerate(f):
        print("Line {}: {}".format(index, line.strip()))

最初有缺陷的想法輸出：

Line 0: 
Line 1: A
Line 2: Action1
Line 3: Action2
Line 4: Action3
Line 5: B
Line 6: 
Line 7: A
Line 8: Action1
Line 9: B
Line 10: 
Line 11: A
Line 12: Action1
Line 13: Action1
Line 14: B
Line 15: 
Line 16: A
Line 17: Action0
Line 18: Action1
Line 19: Action2
Line 20: Action3
Line 21: B

然后我想出了檢測初始（START）和最終（END）模式的想法，計算中間的行數，如果 else 陳述句可以分配 P1 或 P2 類別。目前堅持實施一種計算模式內行數的方法。

任何有關代碼的幫助都會有所幫助，謝謝！

uj5u.com熱心網友回復：

如果檔案資料正是您在問題中提到的，那么以下代碼應該可以作業。

import pandas as pd

result = []
fp = 'your_file.txt'                       # change this

with open(fp) as file:
    file_content = file.read().splitlines()
    count = 0

    # this is the logic you were after:
    for item in file_content:
        if item.strip() == 'START':
            count = 0
        elif item.strip() == 'END /':
            if count <= 1:
                result.append('P1')
            else:
                result.append('P2')
        else:
            count  = 1

print(result)

dataframe = pd.DataFrame(result, columns=['Category'])

# Note: Pandas module needs openpyxl module installed for this next step
dataframe.to_excel('excel.xlsx', index=False)

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/485421.html

標籤：Python 擅长列表文本

上一篇：excel里面有for回圈嗎？

下一篇：VBA沒有讀取陣列中的所有資料