我正在嘗試使用 Tensorflow 和 Keras 了解機器學習中的計算機視覺
我有一個目錄,其中包含從https://www.kaggle.com/datasets/smaranjitghose/corn-or-maize-leaf-disease-dataset獲得的 4185 張圖片 (我故意洗掉了 3 張圖片)
我有這個代碼包含listdir()檢查它是否是真的:
import os
folders = os.listdir('/tmp/datasets/data')
print(f'folders: {folders}')
total_images = 0
for f in folders:
total_images = len(os.listdir(f'/tmp/datasets/data/{f}'))
print(f'Total Images found: {total_images}')
以下是輸出:
folders: ['Blight', 'Common_Rust', 'Gray_Leaf_Spot', 'Healthy']
Total Images found: 4185
我想用 Keras 的 ImageDataGenerator 將它分成 80% 的訓練集和 20% 的驗證集
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(
rescale = 1./255,
fill_mode='nearest',
width_shift_range = 0.05,
height_shift_range = 0.05,
rotation_range = 45,
shear_range = 0.1,
zoom_range=0.2,
horizontal_flip = True,
vertical_flip = True,
validation_split = 0.2,
)
val_datagen = ImageDataGenerator(
rescale = 1./255,
validation_split = 0.2
)
train_images = datagen.flow_from_directory('/tmp/datasets/data',
target_size=(150,150),
batch_size=32,
seed=42,
subset='training',
class_mode='categorical'
)
val_images = val_datagen.flow_from_directory('/tmp/datasets/data',
target_size=(150,150),
batch_size=32,
seed=42,
subset='validation',
class_mode='categorical'
)
以下是記錄的輸出flow_from_directory():
Found 3350 images belonging to 4 classes.
Found 835 images belonging to 4 classes.
完成的拆分不是預期的3348 | 837 (0.2 * 4185 = 837),我錯過了什么嗎?還是我誤解了引數validation_split?
uj5u.com熱心網友回復:
資料被拆分為每個檔案夾(類),而不是整個資料集。在此處和此處查看源代碼以了解更多資訊。以下是內部操作的示例flow_from_directory:
import os
folders = os.listdir('/content/data')
print(f'folders: {folders}')
total_images = 0
names = []
paths = []
white_list_formats = ('png', 'jpg', 'jpeg', 'bmp', 'ppm', 'tif', 'tiff')
for f in folders:
paths.append(os.listdir(f'/content/data/{f}'))
for d in os.listdir(f'/content/data/{f}'):
if d.lower().endswith(white_list_formats):
names.append(d)
print(f'Total number of valid images found: {len(names)}')
folders: ['Blight', 'Healthy', 'Common_Rust', 'Gray_Leaf_Spot']
Total number of valid images found: 4188
按檔案夾拆分資料:
training_samples = 0
for p in paths:
split = (0.2, 1)
num_files = len(p)
start, stop = int(split[0] * num_files), int(split[1] * num_files)
valid_files = p[start: stop]
training_samples = len(valid_files)
print(training_samples)
validation_samples = 0
for p in paths:
split = (0, 0.2)
num_files = len(p)
start, stop = int(split[0] * num_files), int(split[1] * num_files)
valid_files = p[start: stop]
validation_samples = len(valid_files)
print(validation_samples)
3352
836
這對應于您從中看到的內容flow_from_directory:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(
rescale = 1./255,
fill_mode='nearest',
width_shift_range = 0.05,
height_shift_range = 0.05,
rotation_range = 45,
shear_range = 0.1,
zoom_range=0.2,
horizontal_flip = True,
vertical_flip = True,
validation_split = 0.2,
)
val_datagen = ImageDataGenerator(
rescale = 1./255,
validation_split = 0.2
)
train_images = datagen.flow_from_directory('/content/data',
target_size=(150,150),
batch_size=32,
seed=42,
subset='training',
shuffle=False,
class_mode='categorical'
)
val_images = val_datagen.flow_from_directory('/content/data',
target_size=(150,150),
batch_size=32,
seed=42,
subset='validation',
shuffle=False,
class_mode='categorical'
)
Found 3352 images belonging to 4 classes.
Found 836 images belonging to 4 classes.
Note that I did not remove the 3 images like you did, but the logic remains the same.
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/456244.html
上一篇:層“bidirectional_2”的輸入0與層不兼容:預期ndim=3,發現ndim=2
下一篇:將影像拼接在一起
