一些最常用的資料集如 MNIST、Fashion MNIST、cifar10/100 在 tf.keras.datasets 中就能找到,但對于其它也常用的資料集如 SVHN、Caltech101,tf.keras.datasets 中沒有,此時我們可以在 TensorFlow Datasets 中找找看,
tensorflow_datasets 里面包含的資料集串列:https://www.tensorflow.org/datasets/catalog/overview#all_datasets
tensorflow_datasets 安裝:pip install tensorflow_datasets
tensorflow_datasets 示例:
得到 tf.data.Dataset 物件:
import tensorflow as tf
import tensorflow_datasets as tfds
data, info = tfds.load("mnist", with_info=True)
print(info)
train_data, test_data = https://www.cnblogs.com/wuliytTaotao/p/data['train'], data['test']
assert isinstance(train_data, tf.data.Dataset)
print(train_data)
得到 numpy.ndarray 物件:
import tensorflow_datasets as tfds
# `batch_size=-1`, will return the full dataset as `tf.Tensor`s.
dataset, info = tfds.load("mnist", batch_size=-1, with_info=True)
print(info)
train, test = dataset["train"], dataset["test"]
print(type(train['image']))
train = tfds.as_numpy(train)
print(type(train['image']))
print(train['image'].shape)
print(train['label'].shape)
tf.data.Dataset 進行簡單劃分驗證集可以參考 https://github.com/tensorflow/datasets/issues/665#issuecomment-502409920
如果想對 MNIST 等資料集手動分層隨機劃分出一個驗證集,還是轉化成 numpy.ndarray 比較方便,再使用 sklearn 的 train_test_split 方法一行代碼就可以搞定,
References
https://www.tensorflow.org/datasets
https://www.tensorflow.org/datasets/catalog/overview#all_datasets
https://github.com/tensorflow/datasets/blob/master/docs/splits.md
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/60177.html
標籤:其他
上一篇:相似性度量
