我有一個包含 27k 記錄作為訓練集的資料框和另一個包含 4k 記錄的測驗資料集。兩個資料集各有 25 個特征。
x_train shape: (27000, 25),
x_test shape: (4000, 25)
訓練集中的資料示例:
|Subject ID|Feat_1|Feat_2|Feat_X|Hr_count|Label|
|s0001 | 89| 31 | 43 | 1 | 0 |
|s0001 | 94| 32 | 68 | 2 | 0 |
|s0001 | 38| 90 | 86 | 3 | 0 |
|s0001 | 79| 34 | 78 | 4 | 1 |
|s0001 | 85| 24 | 70 | 5 | 1 |
|s0002 | 7 | 9 | 32 | 1 | 0 |
|s0002 | 60| 56 | 72 | 2 | 0 |
|s0002 | 68| 72 | 23 | 3 | 0 |
|s0003 | 26| 88 | 1 | 1 | 0 |
|s0004 | 45| 27 | 22 | 1 | 0 |
|s0004 | 10| 80 | 67 | 2 | 0 |
|s0004 | 71| 48 | 21 | 3 | 0 |
|s0004 | 58| 9 | 60 | 4 | 1 |
Hr_count:代表每個受試者在實驗中停留的小時數
標簽:這是我構建分類器時的目標變數。它表示受試者在實驗中停留后收到的標志
我在定義如下的 LSTM RNN 模型上訓練了資料:
model = Sequential()
model.add(LSTM(100, activation='tanh', return_sequences=True, input_shape=(1, 25)))
model.add(LSTM(49, activation='tanh'))
model.add(Dense(1, activation='sigmoid'))
model.fit(
x_train, y_train,
validation_data=(x_test, y_test),
batch_size=32,
epochs=200)
問題:
由于資料的順序性,我想在擬合模型時將動態 batch_size 引數定義為訓練中每個主題的最大 Hr_count 數,以便 LSTM 可以分別獲取每個主題的資料之間的關系(每批將僅包含每個主題的資料)。這意味著每批包含 1 個主題的樣本,按 Hr_count 排序。
在 Keras 或 TensorFlow v2.x 中似乎沒有動態 batch_size 的靈活性(與 TensorFlow v1.x 相反)......
如何為 batch_size 引數定義批量大小是動態的?
uj5u.com熱心網友回復:
您可以創建一個回圈,為每個主題呼叫 model.fit() 函式,然后根據當前 Hr_count 設定批量大小
for subject in list_of_subjects:
hr_count,data = subject
x_train,y_train = data
model.fit(
x_train, y_train,
validation_data=(x_test, y_test),
batch_size=hr_count,
epochs=200)
此代碼運行的 list_subject 必須具有以下形狀
[[Hr_count,[x_triain,y_train]]
uj5u.com熱心網友回復:
我過濾了主題 ID,然后將資料段提供給 model.fit()。看起來模型學得很快。在更大的資料集上嘗試。代碼被泛化以允許更多功能。
import pandas as pd
from io import StringIO
import io
from keras.models import Sequential
from keras.layers import LSTM, Dense
import numpy as np
from sklearn.preprocessing import LabelEncoder
data="""SubjectID,Feat_1,Feat_2,Feat_X,Hr_count,Label
s0001,89,31,43,1,0
s0001,94,32,68,2,0
s0001,38,90,86,3,0
s0001,79,34,78,4,1
s0001,85,24,70,5,1
s0002,7 ,9 ,32,1,0
s0002,60,56,72,2,0
s0002,68,72,23,3,0
s0003,26,88,1 ,1,0
s0004,45,27,22,1,0
s0004,10,80,67,2,0
s0004,71,48,21,3,0
s0004,58,9 ,60,4,1
"""
df=pd.read_csv(io.StringIO(data),sep=",")
df.drop(columns='Hr_count',inplace=True)
encoder=LabelEncoder()
df['SubjectID']=encoder.fit_transform(df['SubjectID'])
print(df)
X_columns=[x for x in df.columns if x!='Label']
features=len(X_columns)
model = Sequential()
model.add(LSTM(100, activation='tanh', return_sequences=True, input_shape=(1, features)))
model.add(LSTM(49, activation='tanh'))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer="rmsprop", loss='binary_crossentropy', metrics=['accuracy'])
grouped=df.groupby('SubjectID')
for group in grouped:
df_batch=pd.DataFrame(columns=['SubjectID','Feat_1','Feat_2','Feat_X','Hr_count','Label'])
for subjectID in group:
filter=df['SubjectID']==subjectID
for key,item in df[filter].dropna().iterrows():
df_batch=df_batch.append({'SubjectID':item['SubjectID'],'Feat_1':item['Feat_1'],'Feat_2':item['Feat_2'],'Feat_X':item['Feat_X'],'Label':item['Label']},ignore_index=True)
#print("\n",df_batch)
X=df_batch[X_columns]
X = np.resize(X,(X.shape[0],1,X.shape[1]))
y=df_batch['Label']
print("\n",X)
model.fit(X,y,batch_size=len(X),
epochs=10)
輸出:
Epoch 10/10
1/1 [==============================] - 0s 13ms/step - loss: 0.0588 - accuracy: 1.0000
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/416383.html
標籤:
