為什么后填充訓練比預填充更快？-有解無憂

我一直在做一些 NLP 分類任務，并注意到如果我使用后填充而不是預填充，我的模型訓練得更快，我想知道為什么會這樣。

我正在使用 Google Colab 通過 GPU 運行時訓練這些模型。這是我的預處理代碼：

PADDING = 'post'

# Tokenising the input strings and padding

tokenizer = Tokenizer(char_level=True)
tokenizer.fit_on_texts(X)
X_tokenized = tokenizer.texts_to_sequences(X)
X_padded = pad_sequences(X_tokenized, maxlen=80, truncating='post', padding=PADDING)
X_train = np.array(X_padded)

# Encoding output one

y1 = y1.to_numpy().reshape(-1, 1)   # Reshape to an array of features
encoder_1 = OneHotEncoder()         # Instantiate encoder
y1 = encoder_1.fit_transform(y1)    # Fit encoder to output 
y1 = y1.toarray()                   # Make output a numpy array

# Encoding output two
    
y2 = y2.to_numpy().reshape(-1, 1)
encoder_2 = OneHotEncoder()
y2 = form_encoder.fit_transform(y2)
y2 = y2.toarray()

現在創建我的模型：

# --- MODEL PARAMETERS ---

vocab_size = len(tokenizer.index_word)   1
y1_size = len(encoder_1.categories_[0])
y2_size = len(encoder_2.categories_[0])

embedding_size = 175
units = 96

# --- MODEL ARCHITECTURE ---

inputs = Input(shape=(None,))
input_embeddings = Embedding(vocab_size, embedding_size, mask_zero=True)(inputs)

shared_lstm = Bidirectional(LSTM(units, return_sequences=True, 
                                 dropout=0.3))(input_embeddings)

y1_lstm = Bidirectional(LSTM(units, dropout=0.3))(shared_lstm)
y1_dense = Dense(y1_size, activation='softmax', name='y1')(y1_lstm)

y2_lstm = Bidirectional(LSTM(units, dropout=0.3))(shared_lstm)
y2_dense = Dense(y2_size, activation='softmax', name='y2')(y2_lstm)

split_shared_model = Model(inputs=inputs, outputs=[y1_dense, y2_dense])

然后編譯為：

split_shared_model.compile(
    optimizer='adam', 
    loss=CategoricalCrossentropy(), 
    metrics=['accuracy']
    )

模型總結如下：

__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_4 (InputLayer)           [(None, None)]       0           []                               
                                                                                                  
 embedding_3 (Embedding)        (None, None, 175)    19075       ['input_4[0][0]']                
                                                                                                  
 bidirectional_8 (Bidirectional  (None, None, 192)   208896      ['embedding_3[0][0]']            
 )                                                                                                
                                                                                                  
 bidirectional_9 (Bidirectional  (None, 192)         221952      ['bidirectional_8[0][0]']        
 )                                                                                                
                                                                                                  
 bidirectional_10 (Bidirectiona  (None, 192)         221952      ['bidirectional_8[0][0]']        
 l)                                                                                               
                                                                                                  
 y1 (Dense)                     (None, 912)          176016      ['bidirectional_9[0][0]']        
                                                                                                  
 y2 (Dense)                     (None, 617)          119081      ['bidirectional_10[0][0]']       
                                                                                                  
==================================================================================================
Total params: 966,972
Trainable params: 966,972
Non-trainable params: 0
__________________________________________________________________________________________________

呼叫該fit()方法后，模型開始訓練。以下是具有上述設定的中間結果：

Epoch 1/50
 398/2647 [===>..........................] - ETA: 1:28 - loss: 8.7918 - y1_loss: 4.9236 - y2_loss: 3.8682 - y1_accuracy: 0.1495 - y2_accuracy: 0.3204
---------------------------------------------------------------------------

但是，如果我更改PADDING為，'pre'我發現訓練要慢得多！

Epoch 1/50
  90/2647 [>.............................] - ETA: 45:52 - loss: 9.8153 - y1_loss: 5.3961 - y2_loss: 4.4192 - y1_accuracy: 0.1243 - y2_accuracy: 0.2788

誰能解釋這是為什么？我認為它可能與嵌入層有關，它是掩蔽，但我不確定。

uj5u.com熱心網友回復：

這與底層LSTM實作有關。實際上有兩種：一種是“原生 Tensorflow”，另一種是高度優化的純 CUDA 實作，速度要快得多。但是，后者只能在特定條件下使用（某些引數設定等）。您可以在檔案中找到詳細資訊。這里的要點是：

輸入，如果使用掩碼，則嚴格右填充。

這意味著預填充版本沒有使用有效的實作，這解釋了運行時慢得多。除了堅持使用后填充外，我認為這里沒有合理的解決方法。

請注意，有時，Tensorflow 實際上會輸出一條警告訊息，表明它必須使用低效的實作。然而，對我來說，這并不一致。如果在預填充案例中產生任何額外的警告輸出，請注意。

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/445651.html

標籤：Python 张量流喀拉斯 nlp 填充

上一篇：Word2Vec尺寸不正確

下一篇：從類ObservableObject中的視圖傳遞變數？