我找到了以下代碼:
# Iterate over the batches of the dataset.
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
# Open a GradientTape to record the operations run
# during the forward pass, which enables auto-differentiation.
with tf.GradientTape() as tape:
# Run the forward pass of the layer.
# The operations that the layer applies
# to its inputs are going to be recorded
# on the GradientTape.
logits = model(x_batch_train, training=True) # Logits for this minibatch
# Compute the loss value for this minibatch.
loss_value = loss_fn(y_batch_train, logits)
# Use the gradient tape to automatically retrieve
# the gradients of the trainable variables with respect to the loss.
grads = tape.gradient(loss_value, model.trainable_weights)
# Run one step of gradient descent by updating
# the value of the variables to minimize the loss.
optimizer.apply_gradients(zip(grads, model.trainable_weights))
最后一部分說
# Use the gradient tape to automatically retrieve
# the gradients of the trainable variables with respect to the loss.
grads = tape.gradient(loss_value, model.trainable_weights)
# Run one step of gradient descent by updating
# the value of the variables to minimize the loss.
optimizer.apply_gradients(zip(grads, model.trainable_weights))
但是在我查看了函式 apply_gradients 之后,我不確定“通過更新運行一步梯度下降”這句話 optimizer.apply_gradients(zip(grads, model.trainable_weights))是否正確。因為它只更新漸變。并且grads = tape.gradient(loss_value, model.trainable_weights)只計算相對于損失函式的推導。但是對于梯度下降,計算梯度的學習率并從損失函式的值中減去。但它似乎奏效了,因為損失在不斷減少。所以我的問題是:apply_gradients 不僅僅是更新嗎?
完整代碼在這里:https ://keras.io/guides/writing_a_training_loop_from_scratch/
uj5u.com熱心網友回復:
.apply_gradients使用梯度對權重執行更新。根據使用的優化器,它可能是梯度下降,即:
w_{t 1} := w_t - lr * g(w_t)
在哪里 g = grad(L)
請注意,不需要訪問損失函式或其他任何東西,您只需要梯度(這是引數長度的向量)。
通常.apply_gradients可以做更多的事情,例如,如果您要使用 Adam,它還會積累一些統計資料并使用它們來重新調整梯度等。
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/415557.html
標籤:
