我有一個來自實驗室設備的 txt 檔案,它以以下格式保存資料:
Run1
Selected data
Time (s) Charge Q (nC) Charge density q (nC/g) Mass (g)
Initial - 21.53 -2.81E-01 -1.41E-03 200.0
Flow - 0.00 0.00E 00 0.00E 00 0.0
Charge (in Coulomb) temporal evolution
3.61 2.44e-11
4.11 2.44e-11
4.61 2.44e-11
5.11 3.66e-11
5.63 3.66e-11
6.14 2.44e-11
6.66 3.66e-11
7.14 3.66e-11
7.67 2.44e-11
8.19 3.66e-11
8.70 2.44e-11
9.20 2.44e-11
9.72 2.44e-11
10.23 2.44e-11
10.73 2.44e-11
Run2
Selected data
Time (s) Charge Q (nC) Charge density q (nC/g) Mass (g)
Initial - 21.53 -2.81E-01 -1.41E-03 200.0
Flow - 0.00 0.00E 00 0.00E 00 0.0
Charge (in Coulomb) temporal evolution
3.61 2.44e-11
4.11 2.44e-11
4.61 2.44e-11
5.11 3.66e-11
5.63 3.66e-11
6.14 2.44e-11
6.66 3.66e-11
7.14 3.66e-11
7.67 2.44e-11
8.19 3.66e-11
Run3
Selected data
Time (s) Charge Q (nC) Charge density q (nC/g) Mass (g)
Initial - 21.53 -2.81E-01 -1.41E-03 200.0
Flow - 0.00 0.00E 00 0.00E 00 0.0
Charge (in Coulomb) temporal evolution
3.61 2.44e-11
4.11 2.44e-11
4.61 2.44e-11
5.11 3.66e-11
5.63 3.66e-11
6.14 2.44e-11
6.66 3.66e-11
7.14 3.66e-11
7.67 2.44e-11
8.19 3.66e-11
8.70 2.44e-11
9.20 2.44e-11
我的測驗檔案夾中有多個這些。我希望簡化和自動化我對這些資料集所做的分析,因為對于另一臺設備,我用更簡單的代碼也取得了類似的成功。
我想要做的是使用 FileName 從每個檔案中提取 3 次運行中的每一個的 2 列測驗資料,并匯出到一個逗號分隔的文本檔案中,檔案名 = FileName-Run#.txt
到目前為止,我所做的是嘗試將文本檔案內容轉換為串列串列,然后嘗試將數字資料單獨處理為新的 csv,但這效果不佳,因為我無法檢測到列的長度我感興趣的資料。
這里的其他幾個 Q-As 在這方面提供了幫助,包括如何在檔案夾中的檔案上運行代碼,如果它有效,那就是。
我使用了一個 jupyter 筆記本——如果有用的話,我可以在這里分享我寫的代碼,盡管我很羞于展示它。
uj5u.com熱心網友回復:
嘗試這個:
import re
from pathlib import Path
input_path = Path("path/to/input_folder")
output_path = Path("path/to/output_folder")
run_name_pattern = re.compile("Run\d ")
data_line_pattern = re.compile("(. ?) (. ?)")
def write_output(input_file: Path, run_name: str, data: str):
output_file = output_path / f"{input_file.stem}-{run_name}.csv"
with output_file.open("w") as fp_out:
fp_out.write(data)
for input_file in input_path.glob("*.txt"):
with input_file.open() as fp:
run_name, data, start_reading = "", "", False
for line in fp:
# If a line matches "Run...", start a new run name
if run_name_pattern.match(line):
run_name = line.strip()
# If the line matches "Charge (in Coulomb)...",
# read in the data, starting with the next line
elif line.startswith("Charge (in Coulomb) temporal evolution"):
start_reading = True
# For the data lines, replace spaces in the middle with a comma
elif start_reading and line != "\n":
data = data_line_pattern.sub(r"\1,\2", line)
# If we encounter a blank line, that means the end of data.
# Flush the data to disk.
elif line == "\n":
write_output(input_file, run_name, data)
run_name, data, start_reading = "", "", False
else:
# If we have reached the end of the file but there still
# data we haven't written to disk, flush it
if data:
write_output(input_file, run_name, data)
uj5u.com熱心網友回復:
這有效:
import csv
import os
import re
# Where the input
PATH_INPUT = "./test.txt"
# Define the output directory
DIR_OUTPUT = "./output"
def is_section_start(line):
"""Function to check to see if a line is the start of a section
The start of a section is defined as starting with "Run"
"""
return re.match("^Run", line)
def is_data_line(line):
"""Function to check to see if a line is a data line
A data line is defined if a line starts with a number
"""
return re.match("^\d", line)
def get_data(line):
"""Split data line into the two numbers"""
split = line.split(" ")
split = [s for s in split if s]
return [float(split[0]), float(split[1])]
if __name__ == "__main__":
# Open up the input file and read data into a dictionary where the key is the run name and the
# value is a list of list of the numbers.
output = {}
with open(PATH_INPUT) as f_in:
current_section = None
for line in f_in.readlines():
line = line.strip()
if is_section_start(line) and current_section != line:
current_section = line
output[current_section] = []
if is_data_line(line):
output[current_section].append(get_data(line))
# Write data
for run, data in output.items():
with open(os.path.join(DIR_OUTPUT, f"{PATH_INPUT}-{run}.txt"), "w") as f_out:
writer = csv.writer(f_out)
writer.writerows(data)
uj5u.com熱心網友回復:
讀取資料有很多非常復雜的方法,但我想介紹一個更簡單的方法:
with open('file.txt') as f:
data = f.read().split('\n\n')
for run in data:
run = run.split('\n')
run_num = run[0]
df = pd.DataFrame(run[6:])[0].str.split(expand=True).astype(float)
df.columns = ['Charge (in Coulomb)', 'temporal evolution']
print(run_num)
print(df)
輸出:
Run1
Charge (in Coulomb) temporal evolution
0 3.61 2.440000e-11
1 4.11 2.440000e-11
2 4.61 2.440000e-11
3 5.11 3.660000e-11
4 5.63 3.660000e-11
5 6.14 2.440000e-11
6 6.66 3.660000e-11
7 7.14 3.660000e-11
8 7.67 2.440000e-11
9 8.19 3.660000e-11
10 8.70 2.440000e-11
11 9.20 2.440000e-11
12 9.72 2.440000e-11
13 10.23 2.440000e-11
14 10.73 2.440000e-11
Run2
Charge (in Coulomb) temporal evolution
0 3.61 2.440000e-11
1 4.11 2.440000e-11
2 4.61 2.440000e-11
3 5.11 3.660000e-11
4 5.63 3.660000e-11
5 6.14 2.440000e-11
6 6.66 3.660000e-11
7 7.14 3.660000e-11
8 7.67 2.440000e-11
9 8.19 3.660000e-11
Run3
Charge (in Coulomb) temporal evolution
0 3.61 2.440000e-11
1 4.11 2.440000e-11
2 4.61 2.440000e-11
3 5.11 3.660000e-11
4 5.63 3.660000e-11
5 6.14 2.440000e-11
6 6.66 3.660000e-11
7 7.14 3.660000e-11
8 7.67 2.440000e-11
9 8.19 3.660000e-11
10 8.70 2.440000e-11
11 9.20 2.440000e-11
uj5u.com熱心網友回復:
這種方法將文本檔案讀入資料框中的單個列,其中每行包含一行。從那里提取每個運行編號和運行結果。
import os
import pandas as pd
txt_file_path = r'D:\jchtempnew\SO\ResultFile.txt'
df = pd.read_fwf(txt_file_path, widths=[999999], header=None)
df_out = \
df[0].str.extract('^(\d .*?) (\d .*)').astype(float) \
.assign(Run=df[0].str.extract('^Run(\d )').ffill()).dropna() \
.rename(columns={0:'Charge (in Coulomb)',1:'temporal evolution'})
for run in df_out['Run'].unique():
df_out[df_out['Run']==run] \
.to_csv(f'{os.path.splitext(txt_file_path)[0]}-Run{run}.csv', index=None)
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/486677.html
上一篇:將資料從第1行移到第0行
