我有一個名為“machinary.txt”的 txt 檔案。這是一個示例資料,我只想抓取從“Section”開始的資料以及在它下面獲得的相應輸入、輸出、bit_match、輸入和獲得的輸出。第一個“部分”之前的任何內容都將被忽略。
txt檔案中的'BIT_MATCH',指的是它下面的'input->'和'output->'。Section_A 有兩個輸入和輸出。
任何“空”值都定義為“N/A”。
'機械.txt'
unwanted data
unwanted data
Input-> 000
Output-> 000
unwanted data
Codex153 @ Section_A_Machinary
Input-> 101
Output-> 010
unwanted data
unwanted data
BIT_MATCH: It matched at bit: 1
Input Obtained: 2
Output Obtained: 2
unwanted data
BIT_MATCH: It matched at bit: 2
Input Obtained: 3
Output Obtained: 3
unwanted data
unwanted data
Input-> 320
Output-> 321
unwanted data
unwanted data
Codex173 @ Section_B_Machinary
Codex183 @ Section_C_Distillery
BIT_MATCH: It matched at bit: 2
Input Obtained: 0
Output Obtained: 0
unwanted data
unwanted data
Input-> 011
Output-> 011
我做過的代碼:
import pprint
with open("machinary.txt", "r") as file:
flag = False
headers = 'Section,Input,Output,Bit Matched'.split(',')
sub_dict = dict.fromkeys(headers,'N/A')
main_dict = {}
bit_match_list = []
input_list =[]
output_list = []
for eachline in file:
if 'Section' in eachline:
flag = True
sub_dict['Section'] = eachline.strip().split()[-1]
if flag:
if 'BIT_MATCH' in eachline:
bit_match_list.append(eachline.strip())
bit_match_list.append(next(file).strip())
bit_match_list.append(next(file).strip())
sub_dict['Bit Matched'] = bit_match_list
#sub_dict['Input bit match']=next(file).strip()
#sub_dict['Output bit match'] = next(file).strip()
if 'Input->' in eachline:
input_list.append(eachline.strip())
sub_dict['Input'] = input_list
output_list.append (next(file).strip())
sub_dict['Output'] = output_list
main_dict[sub_dict['Section']] = sub_dict
sub_dict = dict.fromkeys(headers, 'N/A')
bit_match_list = []
input_list = []
output_list = []
pprint.pprint (main_dict)
上面代碼的輸出:
{'N/A': {'Bit Matched': ['BIT_MATCH: It matched at bit: 1',
'Input Obtained: 2',
'Output Obtained: 2',
'BIT_MATCH: It matched at bit: 2',
'Input Obtained: 3',
'Output Obtained: 3'],
'Input': ['Input-> 320'],
'Output': ['Output-> 321'],
'Section': 'N/A'},
'Section_A_Machinary': {'Bit Matched': 'N/A',
'Input': ['Input-> 101'],
'Output': ['Output-> 010'],
'Section': 'Section_A_Machinary'},
'Section_C_Distillery': {'Bit Matched': ['BIT_MATCH: It matched at bit: 2',
'Input Obtained: 0',
'Output Obtained: 0'],
'Input': ['Input-> 011'],
'Output': ['Output-> 011'],
'Section': 'Section_C_Distillery'}}
預期輸出:
{'Section_A_Machinary': {'Bit Matched': ['BIT_MATCH: It matched at bit: 1',
'Input Obtained: 2',
'Output Obtained: 2',
'BIT_MATCH: It matched at bit: 2',
'Input Obtained: 3',
'Output Obtained: 3'],
'Input': ['Input-> 101', 'Input-> 320'],
'Output': ['Output-> 010', 'Output->321'],
'Section': 'Section_A_Machinary'},
'Section_B_Machinary': {'Bit Matched': 'N/A',
'Input': 'N/A',
'Output': 'N/A',
'Section': 'Section_B_Machinary'},
'Section_C_Distillery': {'Bit Matched': ['BIT_MATCH: It matched at bit: 2',
'Input Obtained: 0',
'Output Obtained: 0'],
'Input': ['Input-> 011'],
'Output': ['Output-> 011'],
'Section': 'Section_C_Distillery'}}
抱歉,措辭冗長。不知何故,它錯過了 Section_B。而 section_A 的 'input->' 和 'output->' 似乎沒有像我想要的那樣附加。有什么簡單的方法可以解決這個問題,最好不要過多地改變上面的代碼?謝謝!
uj5u.com熱心網友回復:
要將文本決議為所需的結構,您可以使用re模塊:
txt = """
unwanted data
unwanted data
Input-> 000
Output-> 000
unwanted data
Codex153 @ Section_A_Machinary
Input-> 101
Output-> 010
unwanted data
unwanted data
BIT_MATCH: It matched at bit: 1
Input Obtained: 2
Output Obtained: 2
unwanted data
BIT_MATCH: It matched at bit: 2
Input Obtained: 3
Output Obtained: 3
unwanted data
unwanted data
Input-> 320
Output-> 321
unwanted data
unwanted data
Codex173 @ Section_B_Machinary
Codex183 @ Section_C_Distillery
BIT_MATCH: It matched at bit: 2
Input Obtained: 0
Output Obtained: 0
unwanted data
unwanted data
Input-> 011
Output-> 011
"""
import re
from itertools import chain
out = {}
for section in re.findall(r"Section(?:(?!Section).) ", txt, flags=re.S):
bit_matches = re.findall(r"BIT_MATCH.*", section)
inp_out = re.findall(
r"Input Obtained.*?Output Obtained.*?$", section, flags=re.S | re.M
)
inputs = re.findall(r"Input->.*", section)
outputs = re.findall(r"Output->.*", section)
name = section.splitlines()[0]
out[name] = {
"Bit Matched": list(
chain.from_iterable(
(a, *b.splitlines()) for a, b in zip(bit_matches, inp_out)
)
)
or "N/A",
"Input": inputs or "N/A",
"Output": outputs or "N/A",
"Section": name,
}
print(out)
印刷:
{
"Section_A_Machinary": {
"Bit Matched": [
"BIT_MATCH: It matched at bit: 1",
"Input Obtained: 2",
"Output Obtained: 2",
"BIT_MATCH: It matched at bit: 2",
"Input Obtained: 3",
"Output Obtained: 3",
],
"Input": ["Input-> 101", "Input-> 320"],
"Output": ["Output-> 010", "Output-> 321"],
"Section": "Section_A_Machinary",
},
"Section_B_Machinary": {
"Bit Matched": "N/A",
"Input": "N/A",
"Output": "N/A",
"Section": "Section_B_Machinary",
},
"Section_C_Distillery": {
"Bit Matched": [
"BIT_MATCH: It matched at bit: 2",
"Input Obtained: 0",
"Output Obtained: 0",
],
"Input": ["Input-> 011"],
"Output": ["Output-> 011"],
"Section": "Section_C_Distillery",
},
}
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/316018.html
上一篇:反應傳單-層控制-衛星視圖
