使用tf.GradientTape()會耗盡所有gpu記憶體，沒有它也沒關系-有解無憂

我正在研究Convolution Tasnet，我制作的模型大小約為 505 萬個變數。

我想使用自定義訓練回圈來訓練它，問題是，

for i, (input_batch, target_batch) in enumerate(train_ds): # each shape is (64, 32000, 1)
    with tf.GradientTape() as tape:
        predicted_batch = cv_tasnet(input_batch, training=True) # model name
        loss = calculate_sisnr(predicted_batch, target_batch) # some custom loss
    trainable_vars = cv_tasnet.trainable_variables
    gradients = tape.gradient(loss, trainable_vars)
    cv_tasnet.optimizer.apply_gradients(zip(gradients, trainable_vars))

這部分耗盡了所有 gpu 記憶體（24GB 可用）。
當我嘗試不使用時tf.GradientTape() as tape，

for i, (input_batch, target_batch) in enumerate(train_ds):
        predicted_batch = cv_tasnet(input_batch, training=True)
        loss = calculate_sisnr(predicted_batch, target_batch)

這使用了合理數量的 gpu 記憶體（大約 5~6GB）。

我tf.GradientTape() as tape為基本的mnist資料嘗試了相同的格式，然后它就可以正常作業了。
那么尺寸重要嗎？但是當我降低BATCH_SIZE到 32 或更小時也會出現同樣的錯誤。

為什么第一個代碼塊會耗盡所有 gpu 記憶體？

當然，我把

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)

此代碼位于第一個單元格中。

uj5u.com熱心網友回復：

梯度磁帶觸發自動區分，這需要跟蹤所有權重和激活的梯度。Autodiff 需要更多的記憶體。這是正常的。您必須手動調整批量大小，直到找到一個有效的，然后調整您的 LR。通常，曲調只是意味著猜測和檢查或網格搜索。（我正在開發一種產品來為您完成所有這些作業，但我不是來插上它的）。

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/406128.html

標籤：

上一篇：在tf.data中使用tf.while_loop來修改陣列的內容會導致錯誤

下一篇：我可以制作一個在其他資料集上預訓練的基于轉換器的聊天機器人嗎？