使用自定義softplus激活函式時，Keras顯示NaN損失-有解無憂

這是我的自定義softplus激活：

def my_softplus(z): 
    return tf.math.log(tf.exp(tf.cast(z,tf.float32)) 1)

如果我運行一個小測驗：

my_softplus([-3.0, -1.0, 0.0, 2.0])

它回傳

<tf.Tensor: shape=(4,), dtype=float32, numpy=array([0.04858733, 0.31326166, 0.6931472 , 2.126928])>

當我運行 tensorflow 自己的 softplus 激活函式時：

tf.keras.activations.softplus([-3.0, -1.0, 0.0, 2.0])

我有

<tf.Tensor: shape=(4,), dtype=float32, numpy=array([0.04858736, 0.31326172, 0.6931472 , 2.126928  ], dtype=float32)>

非常相似的結果，除了最后一位數字不同。

當我在 mnist 資料集的一個子集上擬合以下模型時，

model2=models.Sequential()
model2.add(layers.Flatten(input_shape=(28,28)))
model2.add(layers.Dense(16, activation="softplus",#"softplus",# my_softplus <- this activation
                        kernel_initializer=my_glorot_initializer,
                        kernel_regularizer=my_l1_regularizer,
                        #kernel_constraint=my_positive_weights
                       ))
model2.add(layers.Dense(16, activation="relu"))
model2.add(layers.Dense(10,activation="softmax"))

model2.compile(optimizer="rmsprop",loss=tf.keras.losses.SparseCategoricalCrossentropy(),
             metrics=["accuracy"])

擬合回傳類似

Epoch 1/20
20/20 - 2s - loss: -2.9399e-01 - accuracy: 0.1064 - val_loss: -2.1013e-01 - val_accuracy: 0.1136
Epoch 2/20
20/20 - 1s - loss: -9.9094e-02 - accuracy: 0.1064 - val_loss: 0.0140 - val_accuracy: 0.1136

然而，當我使用我的my_softplus激活函式時，我得到 NaN 的損失。

這是為什么？

注意：您可以在模型構建中注釋掉kernel_initializer和kernel_regularizer，結果將是相似的。

注 2：這是帶有 MWE 的GoogleColab筆記本的鏈接。

uj5u.com熱心網友回復：

在 Colab 中，您沒有對資料進行標準化：

#creating a validation set
x_val=x_train[:50000]
partial_x_train=x_train[50000:]
y_val=y_train[:50000]
partial_y_train=y_train[50000:]

因此，網路必須遍歷產生 NaN 損失的非常大的值。

示例（您的實作）：

def my_softplus(z):
    return tf.math.log(tf.exp(tf.cast(z, tf.float32))   1)

my_softplus(100)
>> <tf.Tensor: shape=(), dtype=float32, numpy=inf>

當您呼叫softplus（通過 TF）作為密集層中的激活時，它將檢查下溢和溢位問題。

在你的問題中，如果你想得到類似的結果，你需要對資料進行歸一化。

源代碼Softplus：https : //github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/softplus_op.h#L31-L58

如果鏈接有變，我會復制到這里。

template <typename Device, typename T>
struct Softplus {
  // Computes Softplus activation.
  //
  // features: any shape.
  // activations: same shape as "features".
  void operator()(const Device& d, typename TTypes<T>::ConstTensor features,
                  typename TTypes<T>::Tensor activations) {
    // Choose a threshold on x below which exp(x) may underflow
    // when added to 1, but for which exp(x) is always within epsilon of the
    // true softplus(x).  Offset of 2 from machine epsilon checked
    // experimentally for float16, float32, float64.  Checked against
    // softplus implemented with numpy's log1p and numpy's logaddexp.
    static const T threshold =
        Eigen::numext::log(Eigen::NumTraits<T>::epsilon())   T(2);
    // Value above which exp(x) may overflow, but softplus(x) == x
    // is within machine epsilon.
    auto too_large = features > features.constant(-threshold);
    // Value below which exp(x) may underflow, but softplus(x) == exp(x)
    // is within machine epsilon.
    auto too_small = features < features.constant(threshold);
    auto features_exp = features.exp();
    activations.device(d) = too_large.select(
        features,                       // softplus(x) ~= x for x large
        too_small.select(features_exp,  // softplus(x) ~= exp(x) for x small
                         features_exp.log1p()));
  }
};

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/371398.html

標籤：张量流 keras文件

上一篇：如何將輸入層連接到Tensorflow中的額外層

下一篇：Keras/Tenserflow-無法使model.fit()作業