我有由幾行標題和 Nx4 大小的矩陣組成的資料檔案。我想從矩陣開始讀取這個檔案,并將它保存到一個變數中作為 numpy 陣列。這些檔案每個大約 300 MB,但示例檔案如下所示:
# Some header line
Not all header lines start with a special character
# -- a keyword --
7.3533498487067E-03 0.0000000000000E 00 1.5509636485369E-25-2.0531419826552E-27
1.7232929428188E-25 1.3463226115772E-28 1.7232929428188E-25 1.3463226115772E-28
4.4805616513289E-25 7.5394066248323E-26 6.7208424769933E-25 1.1093698319396E-25
-6.4623485355705E-25-1.1924016124944E-25-5.6007020641611E-25-5.6915788404426E-26
如果值為正,則有一個空格,但如果值為負,則沒有空格。到目前為止,我嘗試過:
matrix = []
with open('test.txt') as data:
for line in data.readlines()[3:]: # I always know how many header lines should be skipped.
matrix.append(line) # Saves all matrix elements into a list.
matrix = ' '.join([i for item in matrix for i in item.split()]) # Combines all matrix elements into a single string with correct single space separation.
matrix = np.fromstring(matrix, sep=' ') # This was supposed to convert the string into a 2D numpy array.
此代碼產生錯誤:
'DeprecationWarning: string or file could not be read to its end due to unmatched data; this will raise a ValueError in the future.'
我認為它無法讀取科學記數法(這可能是錯誤的),但我不知道如何解決。另外,我認為通過將它從 list 轉換為 str 再到 numpy,我讓它變得比它應該更長的時間。我怎樣才能用 numpy 做到這一點?熊貓解決方案也很受歡迎。
額外:我很感激任何可以擺脫標題行而無需創建/復制到任何新檔案的解決方案。但這不是必需的。
uj5u.com熱心網友回復:
顯然格式的要點是每個數字的字符長度總是相同的,所以你可以利用它:
matrix = []
with open('test.txt') as data:
for line in data.readlines()[3:]:
matrix.append([float(line[i : i 20]) for i in (0, 20, 40, 60)])
matrix = np.array(matrix)
print(matrix)
[[ 7.35334985e-03 0.00000000e 00 1.55096365e-25 -2.05314198e-27]
[ 1.72329294e-25 1.34632261e-28 1.72329294e-25 1.34632261e-28]
[ 4.48056165e-25 7.53940662e-26 6.72084248e-25 1.10936983e-25]
[-6.46234854e-25 -1.19240161e-25 -5.60070206e-25 -5.69157884e-26]]
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/344580.html
