我需要從幾個文本檔案中讀取一些資料,這些文本檔案開頭有亂數行文本。通常檔案如下所示:
file1.dat:
The file contains data
# this is a comment skip me
DataStart
index = integer
Some text
-5.0e-2 3.3 4.0
0 0.0e0 0.0e0
1.0 0.1 3.0
1.5 4.0 1.87
1.7 -4.67 0.124
...
...
15.3 -3.5e02 1.775
file1.dat它的開頭可能包含多行文本,這些文本可以以空格、制表符等開頭。- 我感興趣的資料塊總是在這些行下方,并且有固定的列數,在這種情況下,它有 3 列:
-5.0e-2 3.3 4.0
0 0.0e0 0.0e0
1.0 0.1 3.0
1.5 4.0 1.87
1.7 -4.67 0.124
...
...
15.3 -3.5e02 1.775
包含資料的行可能在每行的開頭有空格/制表符。
我嘗試了以下代碼:
import numpy as np
pattern = r'^[-0-9 ]*'
mydata = np.fromregex('file1.dat', pattern, dtype=float)
但是當我運行它時,我得到:
~/.local/lib/python3.8/site-packages/numpy/lib/npyio.py in fromregex(file, regexp, dtype, encoding)
1530 # Create the new array as a single data-type and then
1531 # re-interpret as a single-field structured array.
-> 1532 newdtype = np.dtype(dtype[dtype.names[0]])
1533 output = np.array(seq, dtype=newdtype)
1534 output.dtype = dtype
TypeError: 'NoneType' object is not subscriptable
非常感激您的幫忙
uj5u.com熱心網友回復:
我認為你的正則運算式需要看起來更像這樣:
pattern = r'\s*([- 0-9e.] )\s ([- 0-9e.] )\s ([- 0-9e.] ).*'
uj5u.com熱心網友回復:
In [603]: txt="""-5.0e-2 3.3 4.0
...: 0 0.0e0 0.0e0
...: 1.0 0.1 3.0
...: 1.5 4.0 1.87
...: 1.7 -4.67 0.124
...: 15.3 -3.5e02 1.775"""
對于標準的 csv 閱讀器,數字布局看起來足夠規則:
In [604]: np.genfromtxt(txt.splitlines())
Out[604]:
array([[-5.000e-02, 3.300e 00, 4.000e 00],
[ 0.000e 00, 0.000e 00, 0.000e 00],
[ 1.000e 00, 1.000e-01, 3.000e 00],
[ 1.500e 00, 4.000e 00, 1.870e 00],
[ 1.700e 00, -4.670e 00, 1.240e-01],
[ 1.530e 01, -3.500e 02, 1.775e 00]])
甚至行拆分:
In [605]: alist=[]
...: for line in txt.splitlines():
...: alist.append(line.split())
...:
In [606]: alist
Out[606]:
[['-5.0e-2', '3.3', '4.0'],
['0', '0.0e0', '0.0e0'],
['1.0', '0.1', '3.0'],
['1.5', '4.0', '1.87'],
['1.7', '-4.67', '0.124'],
['15.3', '-3.5e02', '1.775']]
In [607]: np.array(alist, float)
Out[607]:
array([[-5.000e-02, 3.300e 00, 4.000e 00],
[ 0.000e 00, 0.000e 00, 0.000e 00],
[ 1.000e 00, 1.000e-01, 3.000e 00],
[ 1.500e 00, 4.000e 00, 1.870e 00],
[ 1.700e 00, -4.670e 00, 1.240e-01],
[ 1.530e 01, -3.500e 02, 1.775e 00]])
uj5u.com熱心網友回復:
要匹配浮點數,我們可以使用以下正則運算式(有關詳細資訊,請參閱此答案):
[ \-]?(?:0|[1-9]\d*)(?:\.\d )?(?:[eE][ \-]?\d )?
您需要將其添加到組中()以從每一行中提取標記:
# zero or more white spaces
opt_whitespace = r'\s*'
# The number token
number= r'([ \-]?(?:0|[1-9]\d*)(?:\.\d )?(?:[eE][ \-]?\d )?)'
# one or more whitespaces
whitespace= r'\s '
# Number of data columns
N = 3
# The regex
pattern = opt_whitespace number (whitespace number)*(N-1) opt_whitespace r'\n'
data = np.fromregex('file1.dat', pattern, dtype=float)
print(data)
輸出:
[[-5.000e-02 3.300e 00 4.000e 00]
[ 0.000e 00 0.000e 00 0.000e 00]
[ 1.000e 00 1.000e-01 3.000e 00]
[ 1.500e 00 4.000e 00 1.870e 00]
[ 1.700e 00 -4.670e 00 1.240e-01]
[ 1.530e 01 -3.500e 02 1.775e 00]]
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/383880.html
