我是 tensorflow keras 來訓練模型來分類影像是 a 還是 b。我有 20,000 張隨機生成的影像用于訓練(一半 a,一半 b)。 影像 示例 b影像示例
首先我匯入必要的包
import tensorflow
from matplotlib import pyplot as plt
import cv2
from matplotlib import pyplot as plt
import random
from tensorflow.keras import models
from tensorflow.keras import layers
import numpy as np
之后,我從我的檔案夾中加載影像并對其進行處理,將它們變成只有 0 和 1 的陣列,將它們與適當的標簽一起保存,如果影像是 a,則為 1,如果影像是 b,則為 0 . 完成此操作后,我將它們放在一個串列中并打亂該串列以使其隨機化。
a_letters = []
b_letters = []
folder_path_a = 'C:/path/to/folder/'
folder_path_b = 'C:/path/to/folder/'
count = 0
while count < 10000:
path = folder_path_a f'a{count}.png'
img = cv2.imread(path)
gray_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
for row_number, row in enumerate(gray_image):
for collumn_number, collumn in enumerate(row):
if gray_image[row_number][collumn_number] > 50:
gray_image[row_number][collumn_number] = 1
else:
gray_image[row_number][collumn_number] = 0
#gray_image = np.expand_dims(gray_image, axis=2)
image_and_label = [gray_image, 1]
a_letters.append(image_and_label)
count = count 1
count = 0
while count < 10000:
path = folder_path_b f'b{count}.png'
img = cv2.imread(path)
gray_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
for row_number, row in enumerate(gray_image):
for collumn_number, collumn in enumerate(row):
if gray_image[row_number][collumn_number] > 50:
gray_image[row_number][collumn_number] = 1
else:
gray_image[row_number][collumn_number] = 0
# gray_image = np.expand_dims(gray_image, axis=2)
image_and_label = [gray_image, 0]
b_letters.append(image_and_label)
count = count 1
unified_list = a_letters b_letters
random.shuffle(unified_list)
接下來,我將標簽和影像分離到它們自己的串列中,并將它們分成訓練和驗證資料。
images = []
labels = []
for image, label in unified_list:
images.append(image)
labels.append(float(label))
x_train = images[:15000]
y_train = labels[:15000]
x_val = images[15000:]
y_val = labels[15000:]
然后我將串列轉換為 numpy 陣列,并擴展標簽的維度(之前,我嘗試訓練模型,我得到一個錯誤,說 logits 和標簽需要是相同的維度,所以我擴展了標簽的維度使它們與影像的尺寸相同)
x_train_array = np.asarray(x_train)
y_train_array = np.asarray(y_train)
x_val_array = np.asarray(x_val)
y_val_array = np.asarray(y_val)
y_train_array = np.expand_dims(y_train_array, axis =1)
y_val_array = np.expand_dims(y_val_array, axis = 1)
接下來,我建立一個模型并對其進行訓練:
model = models.Sequential()
model.add(layers.Dense(512, activation='relu', input_shape=(169,191,)))
model.add(layers.Dense(150, activation='relu'))
model.add(layers.Dense(250, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(x_train_array, y_train_array, epochs=10, batch_size=500, validation_data=(x_val_array, y_val_array))
這是模型摘要: 模型摘要
當我嘗試使用以下代碼對我的模型進行預測時:
predictions = model.predict(x_val_array)
我得到 (5000, 169, 1) 的 predictions.shape。似乎不是每張影像得到一個預測,而是得到 169?我已經為此作業了一段時間,但我似乎無法弄清楚。
uj5u.com熱心網友回復:
形狀 169 來自輸入影像的寬度。
它被延續是因為如果你添加一個密集層,它只連接前一個張量的一個維度。
您可以嘗試的第一件事是展平您的影像:
展平
model = models.Sequential()
model.add(layers.Flatten(input_shape = (169,191,)))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(150, activation='relu'))
model.add(layers.Dense(250, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.summary()
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
predictions = model.predict(example)
predictions.shape
Model: "sequential_12"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten_7 (Flatten) (None, 32279) 0
dense_40 (Dense) (None, 512) 16527360
dense_41 (Dense) (None, 150) 76950
dense_42 (Dense) (None, 250) 37750
dense_43 (Dense) (None, 1) 251
=================================================================
Total params: 16,642,311
Trainable params: 16,642,311
Non-trainable params: 0
_________________________________________________________________
(50, 1)
但是,不建議這樣做,因為與它可能傳達的資訊相比,模型太大了。該模型有 18M 引數,計算效率很低。我寧愿將 ResNet-18 用于 15M 引數模型。
否則,您可以利用卷積層。這是一個例子:
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(169,191,1)))
model.add(layers.Conv2D(32, (3, 3), activation='relu'))
model.add(layers.MaxPool2D(pool_size=(4, 4)))
model.add(layers.Conv2D(32, (3, 3), activation='relu'))
model.add(layers.Conv2D(32, (3, 3), activation='relu'))
model.add(layers.MaxPool2D(pool_size=(4, 4)))
model.add(layers.Flatten())
model.add(layers.Dense(150, activation='relu'))
model.add(layers.Dense(250, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.summary()
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
predictions = model.predict(example)
predictions.shape
Model: "sequential_17"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_24 (Conv2D) (None, 167, 189, 32) 320
conv2d_25 (Conv2D) (None, 165, 187, 32) 9248
max_pooling2d_9 (MaxPooling (None, 41, 46, 32) 0
2D)
conv2d_26 (Conv2D) (None, 39, 44, 32) 9248
conv2d_27 (Conv2D) (None, 37, 42, 32) 9248
max_pooling2d_10 (MaxPoolin (None, 9, 10, 32) 0
g2D)
flatten_12 (Flatten) (None, 2880) 0
dense_56 (Dense) (None, 150) 432150
dense_57 (Dense) (None, 250) 37750
dense_58 (Dense) (None, 1) 251
=================================================================
Total params: 498,215
Trainable params: 498,215
Non-trainable params: 0
_________________________________________________________________
(50, 1)
它小了 30 倍,但性能會好得多,因為卷積層擅長提取特征。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/479466.html
下一篇:一種熱編碼分類
