使用python/numpy重新采樣時間序列資料-有解無憂

我正在嘗試使用 python 中的以下代碼重新采樣一些時間序列資料。我要做的是采用 2 個陣列，一個具有觀察時間，另一個具有值。在給定的示例中time和values。我想以特定的時間間隔記錄觀察結果，比如時間間隔0.1中的每個[0,2]。這將給出一個[[time, values]]看起來像的陣列[[0.0, 1], [1.0, 3], [1.2, 5], [1.3, 6], [2.0, 8]]。現在在下一次更新之間的每個中間值，我想用最后一個值填充它，例如。在[0.0, 1], [1.0, 3]我想要對之間 [0.1, 1] ..[0.9, 1]。

import numpy as np

time =   np.array([0, 0.0120, 0.0130, 1, 1.02, 1.2,  1.3, 1.32, 2 ])
values = np.array([1,   3,    2,      3,  4,     5,    6,   7,  8])

#time_interval = 1e-1

newvals = []

def resample(time, values, time_interval):
    currentTime = 0.0
    newvals = []
    vals = []
    for idx in range(len(time)):
        t = time[idx]
        data = values[idx]
        if t >= currentTime:
            newvals.append([max(t, currentTime),data])
            currentTime = max(t, currentTime)  time_interval
        else:
             continue 

    #vals = []
    newvals_copy = np.array(newvals)
    seen = set(newvals_copy[:,0])
    for idx in range(len(newvals)):
        rec_time = newvals[idx][0]
        #seen.add(rec_time)
        #print(idx, idx 1, len(newvals))
        next_idx = idx 1
        if next_idx == len(newvals):
            break
        else:
            #print(idx)
            next_time = newvals[idx 1][0]
            intermediate_val = np.arange(rec_time , next_time, time_interval)
            #print(intermediate_val[])
            for value in intermediate_val:
                if value not in seen:
                    #print(idx, value)
                    newvals.append([value, newvals[idx][1]])
                    
    newvals = np.array(newvals)
    newvals=np.unique(newvals, axis=0)
    return newvals#, len(newvals)
print(resample(time, values, 1e-1))
#print(np.arange(0,2.1,0.1), len(np.arange(0,2.1,0.1)))

此測驗用例產生所需的輸出，

[[0.  1. ]
 [0.1 1. ]
 [0.2 1. ]
 [0.3 1. ]
 [0.4 1. ]
 [0.5 1. ]
 [0.6 1. ]
 [0.7 1. ]
 [0.8 1. ]
 [0.9 1. ]
 [1.  3. ]
 [1.1 3. ]
 [1.2 5. ]
 [1.3 6. ]
 [1.4 6. ]
 [1.5 6. ]
 [1.6 6. ]
 [1.7 6. ]
 [1.8 6. ]
 [1.9 6. ]
 [2.  8. ]]

但是，在real_data = resample(real_time, real_values, 1e-1)where real_time= https://filedropper.com/d/s/7Q5IqtRzEh42p2oU0qTrbnuTb8iRK4上運行，

real_values= https://filedropper.com/d/s/hixCEY7QmqGcDBjEanhX2UgocKmV4K

real_data[:,0] = [0.         0.1        0.10000947 0.20000947 0.20000981 0.30000981
 0.30001438 0.40001438 0.40002427 0.50002427 0.50006157 0.60006157
 0.6000784  0.7000784  0.70009857 0.80009857 0.80010533 0.90010533
 0.90012181 1.00012181 1.00020397 1.10020397 1.1002512  1.2002512
 1.20025706 1.30025706 1.30026507 1.40026507 1.4002744  1.5002744
 1.50028026 1.60028026 1.60029566 1.70029566 1.70029855 1.80029855
 1.80030084 1.90030084 1.90032208]

which is incorrect. The expected output should be the same as the test case,

  real_data[:,0]=[0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.  1.1 1.2 1.3 1.4 1.5 1.6 1.7
     1.8 1.9 2. ]

I am guessing the error is due to how floating points/decimals are handled? but I am not sure how to remedy this. EDIT Tried rounding based on comments, better but not ideal real_data = resample(np.round(real_time, decimals = 1), real_values, 1e-1)still gives an array with duplicates real_data[:,0][0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.8 0.9 1. 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2. ]

Ideally I want to make this solution workable for arbitrary data sets and time interval values.

uj5u.com熱心網友回復：

可能像這樣的東西可以滿足您的需求：

import numpy as np
from scipy.interpolate import interp1d

# The test data.
time = np.array([0, 0.0120, 0.0130, 1, 1.02, 1.2, 1.3, 1.32, 2])
values = np.array([1, 3, 2, 3, 4, 5, 6, 7, 8])

# The new time basis we're aiming for.
t_new = np.linspace(0, 2, 21)

# Throw away times that we don't like.
new_data = [[t, v] for t, v in zip(time, values) if t in t_new]
t_clean, v_clean = np.array(new_data).T

# Make the interpolator function.
func = interp1d(t_clean, v_clean, kind="previous")

# Interpolate the data into the new time basis.
v_new = func(t_new)

現在v_new是這樣的：

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 3., 3., 3., 6., 6., 6., 6.,
       6., 6., 6., 8.])

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/441544.html

標籤：python numpy time-series

上一篇：AttributeError：“元組”物件沒有屬性“ndim”matplotlib

下一篇：將曲線擬合到某些資料點