這是我的熊貓資料框,帶有逐個刻度的資料。我想制作帶有 512 個刻度的 OHLC 蠟燭。

open = 滾動 512 個刻度視窗的第一個刻度值
close = 滾動 512 個刻度視窗的最后一個刻度值
high = 滾動 512 個刻度視窗的最大刻度值
低 = 滾動 512 個刻度視窗的最小刻度值
基本上,我無法在滾動視窗中獲取第一個和最后一個元素.. 滾動不存在頭部和尾部:)
high = df['Close'].rolling(100).max()
low = df['Close'].rolling(100).mim()
open = df['Close'].rolling(100).head(1)
close = df['Close'].rolling(100).tail(1)
volume = df['Volume'].rolling(100).sum()
我認為在 Pandas 中應該有一種簡單的方法。
我看過重新采樣和聚合選項..我無法讓它作業..:)
由于它的刻度資料,我希望看到 Numpy 選項,因為熊貓對于這個數量和速度來說很慢。只是將資料轉換為資料幀進行壓縮似乎是一種矯枉過正?
https://pandas.pydata.org/pandas-docs/version/1.2.4/reference/api/pandas.DataFrame.resample.html
https://pandas.pydata.org/docs/reference/api/pandas.core.window.rolling.Rolling.aggregate.html
uj5u.com熱心網友回復:
而不是headand tail,您可以apply像這樣使用:
open=df['Close'].rolling(100).(lambda x: x.iloc[0])
close=df['Close'].rolling(100).(lambda x: x.iloc[-1])
至于速度,由于滾動視窗,這總是會有點慢,但是您可以根據下面的代碼提取值并在 numpy 中執行性能(歸功于https://stackoverflow.com/a/57491913/10475762 ):
import pandas as pd, numpy as np
N=10000
df = pd.DataFrame({'Close':range(N),'Volume':np.random.randn(N)})
def pd_stats(df):
close_rolling = df['Close'].rolling(100)
high = close_rolling.max()
low = close_rolling.min()
open = close_rolling.apply(lambda x: x.iloc[0])
close = close_rolling.apply(lambda x: x.iloc[1])
volume = df['Volume'].rolling(100).sum()
def buffer(X = np.array([]), n = 1, p = 0):
#buffers data vector X into length n column vectors with overlap p
#excess data at the end of X is discarded
n = int(n) #length of each data vector
p = int(p) #overlap of data vectors, 0 <= p < n-1
L = len(X) #length of data to be buffered
m = int(np.floor((L-n)/(n-p)) 1) #number of sample vectors (no padding)
data = np.zeros([n,m]) #initialize data matrix
for startIndex,column in zip(range(0,L-n,n-p),range(0,m)):
data[:,column] = X[startIndex:startIndex n] #fill in by column
return data
def np_stats(df):
close_rolling = buffer(df['Close'].values,n=100,p=99) #100-sized buffer with stepsize of 1
high = close_rolling.max(0)
low = close_rolling.min(0)
open = close_rolling[0]
close = close_rolling[-1]
volume_rolling = buffer(df['Volume'].values,n=100,p=99).sum(0)
定時這兩個給出:
In [66]: %timeit -n 10 pd_stats(df)
586 ms ± 5.41 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [67]: %timeit -n 10 np_stats(df)
28.8 ms ± 2.52 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
但是,該np_stats函式不會超出滾動視窗超出資料范圍所提供的值范圍(這意味著回傳的統計樣本數量減少了大約 200 個 - 其中大多數是 NaN,因為它們不在資料范圍內收藏,但值得牢記)。
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/518096.html
