optimizer.apply_gradients做梯度下降嗎？-有解無憂

我找到了以下代碼：

# Iterate over the batches of the dataset.
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):

    # Open a GradientTape to record the operations run
    # during the forward pass, which enables auto-differentiation.
    with tf.GradientTape() as tape:

        # Run the forward pass of the layer.
        # The operations that the layer applies
        # to its inputs are going to be recorded
        # on the GradientTape.
        logits = model(x_batch_train, training=True)  # Logits for this minibatch

        # Compute the loss value for this minibatch.
        loss_value = loss_fn(y_batch_train, logits)

    # Use the gradient tape to automatically retrieve
    # the gradients of the trainable variables with respect to the loss.
    grads = tape.gradient(loss_value, model.trainable_weights)

    # Run one step of gradient descent by updating
    # the value of the variables to minimize the loss.
    optimizer.apply_gradients(zip(grads, model.trainable_weights))

最后一部分說

 # Use the gradient tape to automatically retrieve
# the gradients of the trainable variables with respect to the loss.
grads = tape.gradient(loss_value, model.trainable_weights)

# Run one step of gradient descent by updating
# the value of the variables to minimize the loss.
optimizer.apply_gradients(zip(grads, model.trainable_weights))

但是在我查看了函式 apply_gradients 之后，我不確定“通過更新運行一步梯度下降”這句話 optimizer.apply_gradients(zip(grads, model.trainable_weights))是否正確。因為它只更新漸變。并且grads = tape.gradient(loss_value, model.trainable_weights)只計算相對于損失函式的推導。但是對于梯度下降，計算梯度的學習率并從損失函式的值中減去。但它似乎奏效了，因為損失在不斷減少。所以我的問題是：apply_gradients 不僅僅是更新嗎？

完整代碼在這里：https ://keras.io/guides/writing_a_training_loop_from_scratch/

uj5u.com熱心網友回復：

.apply_gradients使用梯度對權重執行更新。根據使用的優化器，它可能是梯度下降，即：

w_{t 1} := w_t - lr * g(w_t)

在哪里 g = grad(L)

請注意，不需要訪問損失函式或其他任何東西，您只需要梯度（這是引數長度的向量）。

通常.apply_gradients可以做更多的事情，例如，如果您要使用 Adam，它還會積累一些統計資料并使用它們來重新調整梯度等。

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/415557.html

標籤：

上一篇：用from_generator擬合tf.data.Dataset

下一篇：如何在tf.function（圖形模式）中展平梯度（張量串列）