Tensorflow的seq2seq：tensorflow.python.framework.errors_impl.InvalidArgumentError-有解無憂

在對其他資料進行測驗時，我非常關注此處的 Seq2seq 翻譯教程https://www.tensorflow.org/addons/tutorials/networks_seq2seq_nmt#define_the_optimizer_and_the_loss_function 。實體化定義為的編碼器時遇到錯誤

class Encoder(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz):
    super(Encoder, self).__init__()
    self.batch_sz = batch_sz
    self.enc_units = enc_units
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)

    ##-------- LSTM layer in Encoder ------- ##
    self.lstm_layer = tf.keras.layers.LSTM(self.enc_units,
                                   return_sequences=True,
                                   return_state=True,
                                   recurrent_initializer='glorot_uniform')

  def call(self, x, hidden):
    x = self.embedding(x)
    output, h, c = self.lstm_layer(x, initial_state = hidden)
    return output, h, c

  def initialize_hidden_state(self):
    return [tf.zeros((self.batch_sz, self.enc_units)), tf.zeros((self.batch_sz, self.enc_units))]

在這里測驗時它正在下降

# Test Encoder Stack
encoder = Encoder(vocab_size, embedding_dim, units, BATCH_SIZE)

# sample input
sample_hidden = encoder.initialize_hidden_state()
sample_output, sample_h, sample_c = encoder(example_input_batch, sample_hidden)

錯誤如下

Traceback (most recent call last):
  File "C:/Users/Seq2seq/Seq2seq-V3.py", line 132, in <module>
    sample_output, sample_h, sample_c = encoder(example_input_batch, sample_hidden)
  File "C:\Users\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "C:/Users/Seq2seq/Seq2seq-V3.py", line 119, in call
    x = self.embedding(x)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Exception encountered when calling layer "embedding" (type Embedding).

indices[12,148] = 106 is not in [0, 106) [Op:ResourceGather]

Call arguments received:
  ? inputs=tf.Tensor(shape=(64, 200), dtype=int64)

TF 2.0

這可能是 TF Addons 中的問題，您對此有一些經驗嗎？

編輯

本教程在單詞級別進行標記：我在 char 級別對文本進行編碼，106 是我vocab_size的（字符數）

uj5u.com熱心網友回復：

當您有一個包含超出定義詞匯大小范圍的整數值的序列時，會發生此錯誤。您可以使用以下示例重現您的錯誤，因為該Embedding層的詞匯表大小為 106，這意味著序列可以具有 0 到 105 之間的值，并且我傳遞了一個值在 0 到 200 之間的隨機序列來強制執行錯誤：

import tensorflow as tf

class Encoder(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz):
    super(Encoder, self).__init__()
    self.batch_sz = batch_sz
    self.enc_units = enc_units
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)

    ##-------- LSTM layer in Encoder ------- ##
    self.lstm_layer = tf.keras.layers.LSTM(self.enc_units,
                                   return_sequences=True,
                                   return_state=True,
                                   recurrent_initializer='glorot_uniform')

  def call(self, x, hidden):
    x = self.embedding(x)
    output, h, c = self.lstm_layer(x, initial_state = hidden)
    return output, h, c

  def initialize_hidden_state(self):
    return [tf.zeros((self.batch_sz, self.enc_units)), tf.zeros((self.batch_sz, self.enc_units))]

units = 32
BATCH_SIZE = 10
embedding_dim = 20
vocab_size = 106
encoder = Encoder(vocab_size, embedding_dim, units, BATCH_SIZE)
sample_hidden = encoder.initialize_hidden_state()

example_input_batch = tf.random.uniform((10, 15), maxval=201, dtype=tf.int32)
sample_output, sample_h, sample_c = encoder(example_input_batch, sample_hidden)

uj5u.com熱心網友回復：

事實上，這已經足夠暗示了

indices[12,148] = 106 is not in [0, 106) [Op:ResourceGather]

我必須確保我的詞匯量是vocab_size = len(vocab) 1. 資料集構建現在進行

text = open(FILE_PATH, 'rb').read().decode(encoding='utf-8') 
vocab = sorted(set(text))

# [...]

vocab_size = len(vocab) 1

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/434149.html

標籤：Python 张量流喀拉斯 nlp 嵌入

上一篇：關于樣本數量的LSTM資料基數模棱兩可的問題

下一篇：使用create_tf_dataset_for_client()定義資料集中的訓練樣例