由于作業繁忙原因,對人臉識別技術原理的連載停了一段時間,從今天開始嘗試恢復回來,我們先回想一下前面完成的作業,這幾節主要任務就是為神經網路的訓練準備足夠多的資料,第一步是創建不包含或者包含人臉部分小于30%的圖片,我們從人臉圖片資料集中的每張圖片隨機選取一個矩形區域,確定該區域與人臉區域不重合或重合部分少于30%,這部分資料我們成為neg,目的是告訴網路沒有人臉的圖片是怎樣的,
接著再次選取一系列區域,這次確保選取的區域與人臉區域的重合度高于30%但是低于65%,這類資料我們稱為part,其目的是訓練網路識別部分人臉,由此增強網路對人臉的認知能力,第三部分就是選取一系列矩形區域,確保區域與人臉部分的重合度大于65%,這部分資料稱為positive,其目的是讓網路學會識別人臉特征,
同時我們還找來資料集“Deep Convolutional Network Cascade for Facial Point Detection”,該資料集包含了眾多人臉圖片,同時標記了人臉中五個關鍵點的坐標,這些關鍵點分別為左右眼睛,鼻子,還有兩邊嘴角,我們要訓練網路在識別圖片時能找到這5個關鍵點所在位置,這樣才能有效提高網路對圖片中人臉的查詢能力,
這一系列資料要輸入網路時,讀取IO是一個瓶頸,為了提升讀取效率,我們需要將這些資料集中起來形成聯系的存盤塊,這樣讀入記憶體時效率才能保證,要知道我們需要將幾十萬張小圖片輸入給網路,因此IO讀寫是有效訓練神經網路的關鍵,此次我們采用tensorflow框架下的tfrecord來存盤資料,其原理與我們在上一節講解過的protocol buffer一模一樣,
接下來我們要把前面幾節獲取的相關圖片資料,人臉矩形歸一化后對應的坐標,人臉五個關鍵點歸一化坐標等,這里總共有將近一百多萬條資料需要處理,因此資料的讀寫非常棘手,首先要做的就是將所有坐標資訊從多個檔案讀取到記憶體中,代碼如下:
def get_dataset(dir, item): #對應生成的train_pnet_landmark.txt
print('dir is: ', dir, item)
dataset_dir = os.path.join(dir, item)
print('join path: ', os.path.join(dir, item))
print('dataset dir: ', dataset_dir)
image_list = open(dataset_dir, 'r')
dataset = []
for line in tqdm(image_list.readlines()):
info = line.strip().split(' ')
if len(info) < 2:
print('info err: ', info)
data_example = {}
bbox = {}
data_example['filename'] = info[0]
data_example['label'] = int(info[1])
bbox['xmin'] = 0.0 #初始化人臉區域
bbox['ymin'] = 0.0
bbox['xmax'] = 0.0
bbox['ymax'] = 0.0
bbox['xlefteye'] = 0.0 #初始化10個關鍵點
bbox['ylefteye'] = 0.0
bbox['xrighteye'] = 0.0
bbox['yrighteye'] = 0.0
bbox['xnose'] = 0.0
bbox['ynose'] = 0.0
bbox['xleftmouth'] = 0.0
bbox['yleftmouth'] = 0.0
bbox['xrightmouth'] = 0.0
bbox['yrightmouth'] = 0.0
if len(info) == 6: #當前記錄只包含人臉區域
bbox['xmin'] = float(info[2])
bbox['ymin'] = float(info[3])
bbox['xmax'] = float(info[4])
bbox['ymax'] = float(info[5])
if len(info) == 12: #當前記錄包含了10個人臉關鍵點
bbox['xlefteye'] = float(info[2])#初始化10個關鍵點
bbox['ylefteye'] = float(info[3])
bbox['xrighteye'] = float(info[4])
bbox['yrighteye'] = float(info[5])
bbox['xnose'] = float(info[6])
bbox['ynose'] = float(info[7])
bbox['xleftmouth'] = float(info[8])
bbox['yleftmouth'] = float(info[9])
bbox['xrightmouth'] = float(info[10])
bbox['yrightmouth'] = float(info[11])
data_example['bbox'] = bbox
dataset.append(data_example)
return dataset
這些資訊存盤在前面幾節我們生成的pos_12.txt,landmark_12_aug.txt等檔案中,接下來是把前面截取的圖片塊資料轉換成字串讀取到記憶體中:
def process_image(filename):
try:
image = cv2.imread(filename)
#print('image to string')
image_data = image.tostring()
#print('finish image to string')
assert len(image.shape) == 3
height = image.shape[0]
width = image.shape[1]
assert image.shape[2] == 3
return image_data, height, width
except Exception as e:
#print('process image err: ', e, filename)
return None, None, None
第三部就是將前面兩步讀取的資訊寫入到tfrecord資料結構中,該結構會以特定格式存盤成檔案:
def _int64_feature(value):
if not isinstance(value, list):
value = [value]
try:
return tf.train.Feature(int64_list=tf.train.Int64List(value = value))
except Exception as e:
print('int64 err: ', e)
def _float_feature(value):
if not isinstance(value, list):
value = [value]
try:
return tf.train.Feature(float_list=tf.train.FloatList(value = value))
except Exception as e:
print('float err: ', e)
def _bytes_feature(value):
if not isinstance(value, list):
value = [value]
try:
return tf.train.Feature(bytes_list = tf.train.BytesList(value = value))
except Exception as e:
print('bytes err: ', e)
def convert_to_example(image_example, image_buffer):
class_label = image_example['label']
bbox = image_example['bbox']
roi = [bbox['xmin'], bbox['ymin'], bbox['xmax'], bbox['ymax']]
landmark = [bbox['xlefteye'], bbox['ylefteye'], bbox['xrighteye'], bbox['yrighteye'],
bbox['xnose'], bbox['ynose'], bbox['xleftmouth'],
bbox['yleftmouth'], bbox['xrightmouth'],
bbox['yrightmouth']]
try:
example = tf.train.Example(features = tf.train.Features(feature = {
'image/encoded': _bytes_feature(image_buffer),
'image/label': _int64_feature(class_label),
'image/roi': _float_feature(roi),
'image/landmark': _float_feature(landmark)
}))
return example
except Exception as e:
print('example err: ',e , image_example)
def add_to_tfrecord(filename, image_example, tfrecord_writer):
begin = time.time()
image_data, height, width = process_image(filename)
end = time.time()
# print('time for process image: ', end-begin, filename)
if image_data != None:
# print('convert to example ')
example = convert_to_example(image_example, image_data)
# print('after convert to example')
tfrecord_writer.write(example.SerializeToString())
print('tfrecord write')
dataset_dir = '/content/drive/MyDrive/my_mtcnn/data'
import tensorflow as tf
def create_tf_record(size):
output_dir = os.path.join(dataset_dir, str(size) + "/tf_record")
if not os.path.exists(output_dir):
os.mkdir(output_dir)
if size == 12:
net = 'PNet'
tf_filenames = [os.path.join(output_dir, 'train%s_landmark.tfrecord' %(net))]
items = ['train_pnet_landmark.txt']
elif size == 24: #以后再考慮余下兩只情況
pass
elif size == 48:
pass
if tf.io.gfile.exists(tf_filenames[0]):
print("tf record file alreay created")
for tf_filename, item in zip(tf_filenames, items): #在size=12時看似多于,在后面處理size=24或48時用上
print('reading daa....')
dataset = get_dataset(dataset_dir, item)
tf_filename = tf_filename + '_shuffle'
random.shuffle(dataset)
print('transform to tfrecord')
with tf.io.TFRecordWriter(tf_filename) as tfrecord_writer:
for image_example in tqdm(dataset):
filename = image_example['filename']
try:
add_to_tfrecord(filename, image_example, tfrecord_writer)
except Exception as e:
print('tf record exception: ', e)
print('completing transform..!')
create_tf_record(12)
大家注意看上面關于tfrecord結構的代碼,也就是下面這段:
example = tf.train.Example(features = tf.train.Features(feature = {
'image/encoded': _bytes_feature(image_buffer),
'image/label': _int64_feature(class_label),
'image/roi': _float_feature(roi),
'image/landmark': _float_feature(landmark)
}))
從這里可以看出tfrecord結構跟json或是python的字典結構很像,它也是以key-value的方式存盤,而value則對應byte,float,int等基本資料結構,也就是它特別用于存盤二進制資料,上面的代碼運行后就可以生成基于tfrecord的二進制檔案,該檔案會把前面幾節我們生成的訓練資料集合到一個檔案里,在筆者試驗程序中發現該程序相當緩慢,筆者使用的是colab和google drive,由于資料瑣碎且數量眾多,要完成該步驟,筆者預計要10個小時以上,當我完成該步驟的運行后,我會把結果分享給大家以避免讀者朋友浪費太多時間在資料預處理上,
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/273319.html
標籤:AI
