如何使用hdf5檔案中的1D-陣列，并對其進行減法、加法等操作？ -有解無憂

我有一個一維陣列，看起來像這樣：

我有一個一維陣列。

array([(b'2P1', b'aP1', 2, 37. 33, 4.4 , 3.82）。)
   (b'3P2', b'aP2', 3, 18. 74, -9.67, 4.85）。)
   (b'4P2', b'aP2', 4, 55. 16, 74.22, 4.88）]。

正如你所看到的，這些都是混合的字串。我不能以元素的方式訪問它們，例如，如果我想從第二行減去第一行，只使用帶有浮動數字的列，我不能這樣做！！！我不能這樣做。有什么辦法可以做到這一點嗎？這里是hdf5檔案的鏈接資料檔案。下面是讀取hdf5檔案的代碼：

import numpy as np
import h5py

with h5py.File('xaa.h5', 'r') as hdff:
    base_items = list(hdff.items() )
    print('Items in the base directory: '/span>, base_items)
    dat1 = np.array(hdff['particles/lipids/positions/dataset_0001'])
    dat2 = np.array(hdff['pants/lipids/positions/dataset_0002'] )
    print(dat1)

uj5u.com熱心網友回復：

在 [188]: f = h5py.File(' ./Downloads/xaa.h5')
在[189]: f
Out[189]: <HDF5檔案"xaa.h5"（模式r）>。
...
在[194]: f['pants/lipids/positions'].keys()
Out[194]。< KeysViewHDF5 ['dataset_0000'/span>, 'dataset_0001'/span>, 'dataset_0002'/span>, 'dataset_0003'/span>, 'dataset_0004', 'dataset_0005'/span>, 'dataset_0006'/span>, 'dataset_0007', 'dataset_0008', 'dataset_0009'] >
...
在[196]: f['particles/lipids/positions/dataset_0000'].dtype
Out[196]。dtype([('col1', 'S7'), ('col2', 'S8'), ('col3', '< i8'），（'col4', '<f8'），（'col5', '< f8'），（'col6', '<f8'）]）。

正如我所懷疑的，這是一個結構化陣列。 https://numpy.org/doc/stable/user/basics.rec.html

在 [202]: arr[0]
Out[202]: (b'1P1', b'aP1', 1, 80. 48, 35.36, 4.25)
在[203]: arr['col1'] [:10]
輸出[203]。
array([b'1P1'/span>, b'2P1'/span>, b'3P2'/span>, b'4P2'/span>, b'5P3'/span>, b'6P3'/span>, b'7P4'/span>, b'8P4'/span>,
       b'9P5', b'10P5'], dtype='|S7')

我們可以通過以下方式獲得浮動列的視圖：

在[204]。arr[['col4','col5','col6']][：10]
輸出[204]。
array([(80.48, 35.36, 4. 25），（37.45，3.92，3.96）。
       (18.53, -9.69, 4。 68），（55.39，74.34，4.6 ）。
       (22.11, 68.71, 3。 85），（-4.13，24.04，3.73）。
       (40.16, 6.39, 4. 73），（-5.4 ，35.73，4.85）。
       (36.67, 22.45, 4. 08), (-3.68, -10.66, 4.18) ] 。
      dtype={'names':['col4','col5','col6'], 'forms': ['<f8','<f8','<f8'], '偏移量': [23,31,39], ' itemsize': 47})

但是要把這些欄位當作2D陣列，我們需要使用一個recfunctions工具：

在[198]。import numpy.lib.recfunctions as rf

在[205]: rf. structured_to_unstructured( arr[['col4','col5','col6']][：10]
Out[205]。
array([[80.48, 35.36, 4.25],
       [37.45, 3.92, 3.96] 。
       [18.53, -9.69, 4.68] 。
       [55.39, 74.34, 4.6] 。
       [22.11, 68.71, 3.85] 。
       [-4.13, 24.04, 3.73] 。
       [40.16, 6.39, 4.73] 。
       [-5.4 , 35.73, 4.85] 。
       [36.67, 22.45, 4.08] 。
       [-3.68, -10.66, 4.18]])

uj5u.com熱心網友回復：

上面的答案是一個很好的方法，但你不一定要使用recfunctions。一旦你知道了資料集的dtype和形狀，你可以創建一個空陣列，并通過使用欄位切片符號讀取感興趣的資料來填充，如上面的答案所示。

下面是實作這一目的的代碼。(由于我們知道你正在讀取3個浮點，而浮點是np.empty()的默認dtype，我沒有費心從資料集中獲取欄位dtype--如果你需要對整數或字串欄位進行分片，這將很容易添加。)

with h5py.File('xaa.h5'/span>, 'r'/span>) as hdf:
    grp = hdf['particles/lipids/positions']
    ds1 = grp['dataset_0000']
    nrows = ds1.shape[0]
    arr = np.empty((nrows,3)
    arr[:,0] = ds1['col4'][:] 。
    arr[:,1] = ds1['col5'][:] 。
    arr[:,2] = ds1['col6'][:]
    
    print(arr[0:10, :>)

輸出：

[[ 80.48 35.36 4.25]
 [37.45 3.92 3.96]
 [18.53 -9.69 4.68]
 [ 55.39 74.34 4.6 ]
 [22.11 68.71 3.85]
 [ -4.13 24.04 3.73]
 [40.16 6.39 4.73]
 [ -5.4 35.73 4.85 ]
 [36.67 22.45 4.08]
 [-3.68 -10.66 4.18]]

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/309467.html

標籤：

上一篇：試圖用np.where來掩蓋某些數值

下一篇：在Python中把2x16資料幀變成4x4矩陣