使用Word2Vec和深度卷積自動編碼器的電影相似度-有解無憂

我是 python 新手，我正在嘗試創建一個模型，可以根據電影描述來衡量電影的相似程度，到目前為止我遵循的步驟是：

1. 使用 Word2Vec 將每個電影描述轉換為 100*（電影描述可能的最大字數）值的向量，這會為每個電影描述生成一個 21300 個值的向量。2.創建一個深度卷積自動編碼器，嘗試壓縮每個向量（并希望從中提取意義）。

雖然第一步是成功的，但我仍在努力使用自動編碼器，但到目前為止，這是我的代碼：

encoder_input = keras.Input(shape=(21300,), name='sum')
encoded= tf.keras.layers.Reshape((150,142,1),input_shape=(21300,))(encoder_input)
x = tf.keras.layers.Conv2D(128, (3, 3), activation="relu", padding="same",input_shape=(1,128,150,142))(encoded)
x = tf.keras.layers.MaxPooling2D((2, 2), padding="same")(x)
x = tf.keras.layers.Conv2D(64, (3, 3), activation="relu", padding="same")(x)
x = tf.keras.layers.MaxPooling2D((2, 2), padding="same")(x)#49*25*64
x = tf.keras.layers.Conv2D(32, (3, 3), activation="relu", padding="same")(x)
x = tf.keras.layers.MaxPooling2D((2, 2), padding="same")(x)#25*13*32
x = tf.keras.layers.Conv2D(16, (3, 3), activation="relu", padding="same")(x)
x = tf.keras.layers.MaxPooling2D((2, 2), padding="same")(x)
x = tf.keras.layers.Conv2D(8, (3, 3), activation="relu", padding="same")(x)
x = tf.keras.layers.MaxPooling2D((2, 2), padding="same")(x)
x=tf.keras.layers.Flatten()(x)
encoder_output=keras.layers.Dense(units=90, activation='relu',name='encoder')(x)
x= tf.keras.layers.Reshape((10,9,1),input_shape=(28,))(encoder_output)

# Decoder

decoder_input=tf.keras.layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = tf.keras.layers.UpSampling2D((2, 2))(decoder_input)
x = tf.keras.layers.Conv2D(16, (3, 3), activation='relu')(x)
x = tf.keras.layers.UpSampling2D((2, 2))(x)
x = tf.keras.layers.Conv2D(32, (3, 3), activation='relu')(x)
x = tf.keras.layers.UpSampling2D((2, 2))(x)
x = tf.keras.layers.Conv2D(64, (3, 3), activation='relu')(x)
x = tf.keras.layers.UpSampling2D((2, 2))(x)
x = tf.keras.layers.Conv2D(128, (3, 3), activation='relu')(x)
x = tf.keras.layers.UpSampling2D((2, 2))(x)
decoder_output = keras.layers.Conv2D(1, (3, 3), activation='relu', padding='same')(x)

autoencoder = keras.Model(encoder_input, decoder_output)
opt = tf.keras.optimizers.Adam(learning_rate=0.001, decay=1e-6)

autoencoder = keras.Model(encoder_input, decoder_output, name='autoencoder')

autoencoder.compile(opt, loss='mse')
print("STARTING FITTING")


history = autoencoder.fit(
movies_vector,
movies_vector,
epochs=25,

        )


print("ENCODER READY")
#USING THE MIDDLE LAYER 
encoder = keras.Model(inputs=autoencoder.input,
                    outputs=autoencoder.get_layer('encoder').output)

運行此代碼給我以下錯誤：

required broadcastable shapes [[node mean_squared_error/SquaredDifference (defined at tmp/ipykernel_52/3425712667.py:119) ]] [Op:__inference_train_function_1568]

我有兩個問題：

1.我該如何解決這個錯誤？

2.如何改進我的自動編碼器，以便我可以使用壓縮向量來測驗電影相似度？

uj5u.com熱心網友回復：

您的模型的輸出是 (batch_size, 260, 228, 1)，而您的目標似乎是 (batch_size, 21300)。您可以通過tf.keras.layers.Flatten()在模型末尾添加層或不展平輸入來解決該問題。
您可能不應該使用 2D 卷積，因為在大多數文本嵌入中相鄰特征通道之間沒有空間或時間相關性。您應該能夠安全地重塑為 (150,142) 而不是 (150, 142, 1) 并使用一維卷積、池化和上采樣層。

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/476646.html

標籤：张量流喀拉斯深度学习 word2vec 自动编码器

上一篇：如果我的GAN判別器損失為0，這很糟糕嗎？

下一篇：GAE與SQL的連接：用戶“postgres”的密碼驗證失敗