我正在比較兩個模型,一個使用binary_crossentropy(模型 A)作為優化器,另一個使用mean_squared_error(模型 B)
型號 A)
self.seq_len = 2
in_out_neurons = 50
n_hidden = 500
model = Sequential()
model.add(LSTM(n_hidden, batch_input_shape=(None, self.seq_len, in_out_neurons), return_sequences=True))
model.add(Dense(in_out_neurons, activation="relu"))
optimizer = Adam(learning_rate=0.001)
#model.compile(loss="mean_squared_error", optimizer=optimizer)
model.compile(loss='binary_crossentropy', optimizer=optimizer)
Epoch 1/10
718/718 [==============================] - 32s 42ms/step - loss: -0.0633 - val_loss: -0.0649
Epoch 2/10
718/718 [==============================] - 33s 46ms/step - loss: -0.0632 - val_loss: -0.0572
Epoch 3/10
718/718 [==============================] - 43s 60ms/step - loss: -0.0592 - val_loss: -0.0570
Epoch 4/10
718/718 [==============================] - 51s 71ms/step - loss: -0.0522 - val_loss: -0.0431
Epoch 5/10
718/718 [==============================] - 50s 69ms/step - loss: -0.0566 - val_loss: -0.0535
Epoch 6/10
718/718 [==============================] - 49s 68ms/step - loss: -0.0567 - val_loss: -0.0537
Epoch 7/10
718/718 [==============================] - 48s 67ms/step - loss: -0.0627 - val_loss: -0.0499
Epoch 8/10
718/718 [==============================] - 51s 71ms/step - loss: -0.0621 - val_loss: -0.0614
Epoch 9/10
718/718 [==============================] - 47s 65ms/step - loss: -0.0645 - val_loss: -0.0653
Epoch 10/10
718/718 [==============================] - 43s 60ms/step - loss: -0.0661 - val_loss: -0.0622
模型 B)
self.seq_len = 2
in_out_neurons = 50
n_hidden = 500
model = Sequential()
model.add(LSTM(n_hidden, batch_input_shape=(None, self.seq_len, in_out_neurons), return_sequences=True))
model.add(Dense(in_out_neurons, activation="relu"))
optimizer = Adam(learning_rate=0.001)
model.compile(loss="mean_squared_error", optimizer=optimizer)
#model.compile(loss='binary_crossentropy', optimizer=optimizer)
Epoch 1/10
718/718 [==============================] - 36s 48ms/step - loss: 0.0189 - val_loss: 0.0190
Epoch 2/10
718/718 [==============================] - 46s 64ms/step - loss: 0.0188 - val_loss: 0.0189
Epoch 3/10
718/718 [==============================] - 48s 67ms/step - loss: 0.0187 - val_loss: 0.0189
Epoch 4/10
718/718 [==============================] - 58s 81ms/step - loss: 0.0187 - val_loss: 0.0188
Epoch 5/10
718/718 [==============================] - 62s 87ms/step - loss: 0.0186 - val_loss: 0.0188
Epoch 6/10
718/718 [==============================] - 72s 100ms/step - loss: 0.0186 - val_loss: 0.0188
Epoch 7/10
718/718 [==============================] - 73s 102ms/step - loss: 0.0185 - val_loss: 0.0187
Epoch 8/10
718/718 [==============================] - 60s 84ms/step - loss: 0.0185 - val_loss: 0.0187
Epoch 9/10
718/718 [==============================] - 64s 89ms/step - loss: 0.0185 - val_loss: 0.0187
Epoch 10/10
718/718 [==============================] - 64s 89ms/step - loss: 0.0185 - val_loss: 0.0187
Model B的loss大于0,可以理解。
但是Model A的loss小于0,是什么意思??
uj5u.com熱心網友回復:
交叉熵計算為減去結果對數的期望值。通常在 sigmoid 或 softmax 激活之后使用,其中所有值 <= 1,它們的對數 <= 0,因此結果是 >= 0。結果 < 0。寓意是輸出層激活和損失應該相互對應,并且從您試圖解決的任務的角度來看必須是有意義的。否則你可能會得到毫無意義的結果。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/381030.html
