使用深度學習和 OpenCV 進行目標檢測

基于深度學習的物件檢測時，您可能會遇到三種主要的物件檢測方法：

Faster R-CNNs (Ren et al., 2015)
You Only Look Once (YOLO) (Redmon et al., 2015)
Single Shot Detectors (SSD)（Liu 等人，2015 年）

Faster R-CNNs 可能是使用深度學習進行物件檢測最“聽說”的方法；然而，該技術可能難以理解（特別是對于深度學習的初學者）、難以實施且難以訓練，

此外，即使使用“更快”的 R-CNN 實作（其中“R”代表“區域提議”），演算法也可能非常慢，大約為 7 FPS，

如果追求純粹的速度，那么我們傾向于使用 YOLO，因為這種演算法要快得多，能夠在 Titan X GPU 上處理 40-90 FPS， YOLO 的超快變體甚至可以達到 155 FPS，

YOLO 的問題在于它的準確性不高，

最初由 Google 開發的 SSD 是兩者之間的平衡，該演算法比 Faster R-CNN 更直接，

MobileNets：高效（深度）神經網路

在構建物件檢測網路時，我們通常使用現有的網路架構，例如 VGG 或 ResNet，這些網路架構可能非常大，大約 200-500MB，由于其龐大的規模和由此產生的計算數量，諸如此類的網路架構不適合資源受限的設備，相反，我們可以使用 Google 研究人員的另一篇論文 MobileNets（Howard 等人，2017 年），我們稱這些網路為“MobileNets”，因為它們專為資源受限的設備而設計，例如您的智能手機， MobileNet 與傳統 CNN 的不同之處在于使用了深度可分離卷積，深度可分離卷積背后的一般思想是將卷積分成兩個階段：

3×3 深度卷積，
隨后是 1×1 逐點卷積，

這使我們能夠實際減少網路中的引數數量，問題是犧牲了準確性——MobileNets 通常不如它們的大哥們準確…… ……但它們的資源效率要高得多，

使用 OpenCV 進行基于深度學習的物件檢測

MobileNet SSD 首先在 COCO 資料集（背景關系中的常見物件）上進行訓練，然后在 PASCAL VOC 上進行微調，達到 72.7% mAP（平均精度），

因此，我們可以檢測影像中的 20 個物件（背景類為 +1），包括飛機、自行車、鳥、船、瓶子、公共汽車、汽車、貓、椅子、牛、餐桌、狗、馬、摩托車、人、盆栽植物、羊、沙發、火車和電視顯示幕，

在本節中，我們將使用 OpenCV 中的 MobileNet SSD + 深度神經網路 (dnn) 模塊來構建我們的目標檢測器，

打開一個新檔案，將其命名為 object_detection.py ，并插入以下代碼：

import numpy as np
import cv2
if __name__=="__main__":
	image_name = '11.jpg'
	prototxt = 'MobileNetSSD_deploy.prototxt.txt'
	model_path = 'MobileNetSSD_deploy.caffemodel'
	confidence_ta = 0.2
	# 初始化MobileNet SSD訓練的類標簽串列
	# 檢測，然后為每個類生成一組邊界框顏色
	CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat",
			   "bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
			   "dog", "horse", "motorbike", "person", "pottedplant", "sheep",
			   "sofa", "train", "tvmonitor"]
    COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))

匯入需要的包，

定義全域引數：

image_name：輸入影像的路徑，
prototxt ：Caffe prototxt 檔案的路徑，
model_path ：預訓練模型的路徑，
confidence_ta ：過濾弱檢測的最小概率閾值，默認值為 20%，

接下來，讓我們初始化類標簽和邊界框顏色，

	# load our serialized model from disk
	print("[INFO] loading model...")
	net = cv2.dnn.readNetFromCaffe(prototxt, model_path)
	# 加載輸入影像并為影像構造一個輸入blob
	# 將大小調整為固定的300x300像素，
	# （注意：SSD模型的輸入是300x300像素）
	image = cv2.imread(image_name)
	(h, w) = image.shape[:2]
	blob = cv2.dnn.blobFromImage(cv2.resize(image, (300, 300)), 0.007843,
								 (300, 300), 127.5)
	# 通過網路傳遞blob并獲得檢測結果和
	# 預測
	print("[INFO] computing object detections...")
	net.setInput(blob)
	detections = net.forward()

從磁盤加載模型，

讀取圖片，

提取高度和寬度（第 35 行），并從影像中計算一個 300 x 300 像素的 blob，

將blob放入神經網路，

計算輸入的前向傳遞，將結果存盤為 detections，

	# 回圈檢測結果
	for i in np.arange(0, detections.shape[2]):
		# 提取與資料相關的置信度（即概率）
		# 預測
		confidence = detections[0, 0, i, 2]
		# 通過確保“置信度”來過濾掉弱檢測
		# 大于最小置信度
		if confidence > confidence_ta:
			# 從`detections`中提取類標簽的索引，
			# 然后計算物體邊界框的 (x, y) 坐標
			idx = int(detections[0, 0, i, 1])
			box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
			(startX, startY, endX, endY) = box.astype("int")
			# 顯示預測
			label = "{}: {:.2f}%".format(CLASSES[idx], confidence * 100)
			print("[INFO] {}".format(label))
			cv2.rectangle(image, (startX, startY), (endX, endY),
						  COLORS[idx], 2)
			y = startY - 15 if startY - 15 > 15 else startY + 15
			cv2.putText(image, label, (startX, y),
						cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2)
	# show the output image
	cv2.imshow("Output", image)
	cv2.imwrite("output.jpg", image)
	cv2.waitKey(0)

回圈檢測，首先我們提取置信度值，

如果置信度高于我們的最小閾值，我們提取類標簽索引并計算檢測到的物件周圍的邊界框，

然后，提取框的 (x, y) 坐標，我們將很快使用它來繪制矩形和顯示文本，

接下來，構建一個包含 CLASS 名稱和置信度的文本標簽，

使用標簽，將其列印到終端，然后使用之前提取的 (x, y) 坐標在物件周圍繪制一個彩色矩形，

通常，希望標簽顯示在矩形上方，但如果沒有空間，我們會將其顯示在矩形頂部下方，

最后，使用剛剛計算的 y 值將彩色文本覆寫到影像上，

運行結果：

使用 OpenCV 檢測視頻

打開一個新檔案，將其命名為 video_object_detection.py ，并插入以下代碼：

video_name = '12.mkv'
prototxt = 'MobileNetSSD_deploy.prototxt.txt'
model_path = 'MobileNetSSD_deploy.caffemodel'
confidence_ta = 0.2

# initialize the list of class labels MobileNet SSD was trained to
# detect, then generate a set of bounding box colors for each class
CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat",
           "bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
           "dog", "horse", "motorbike", "person", "pottedplant", "sheep",
           "sofa", "train", "tvmonitor"]
COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))

# load our serialized model from disk
print("[INFO] loading model...")
net = cv2.dnn.readNetFromCaffe(prototxt, model_path)

# initialze the video stream, allow the camera to sensor to warmup,
# and initlaize the FPS counter
print('[INFO] starting video stream...')
vs = cv2.VideoCapture(video_name)
fps = 30    #保存視頻的FPS，可以適當調整
size=(600,325)
fourcc=cv2.VideoWriter_fourcc(*'XVID')
videowrite=cv2.VideoWriter('output.avi',fourcc,fps,size)
time.sleep(2.0)

定義全域引數：

video_name：輸入視頻的路徑，
prototxt ：Caffe prototxt 檔案的路徑，
model_path ：預訓練模型的路徑，
confidence_ta ：過濾弱檢測的最小概率閾值，默認值為 20%，

接下來，讓我們初始化類標簽和邊界框顏色，

加載模型，

初始化VideoCapture物件，

設定VideoWriter物件以及引數，size的大小由下面的代碼決定，需要保持一致，否則不能保存視頻，

接下就是回圈視頻的幀，然后輸入到檢測器進行檢測，這一部分的邏輯和影像檢測一致，代碼如下：

# loop over the frames from the video stream
while True:
    ret_val, frame = vs.read()
    if ret_val is False:
        break
    frame = imutils.resize(frame, width=1080)
    print(frame.shape)
    # grab the frame dimentions and convert it to a blob
    (h, w) = frame.shape[:2]
    blob = cv2.dnn.blobFromImage(cv2.resize(frame, (300, 300)), 0.007843, (300, 300), 127.5)

    # pass the blob through the network and obtain the detections and predictions
    net.setInput(blob)
    detections = net.forward()

    # loop over the detections
    for i in np.arange(0, detections.shape[2]):
        # extract the confidence (i.e., probability) associated with
        # the prediction
        confidence = detections[0, 0, i, 2]

        # filter out weak detections by ensuring the `confidence` is
        # greater than the minimum confidence
        if confidence > confidence_ta:
            # extract the index of the class label from the
            # `detections`, then compute the (x, y)-coordinates of
            # the bounding box for the object
            idx = int(detections[0, 0, i, 1])
            box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
            (startX, startY, endX, endY) = box.astype("int")

            # draw the prediction on the frame
            label = "{}: {:.2f}%".format(CLASSES[idx],
                                         confidence * 100)
            cv2.rectangle(frame, (startX, startY), (endX, endY),
                          COLORS[idx], 2)
            y = startY - 15 if startY - 15 > 15 else startY + 15
            cv2.putText(frame, label, (startX, y),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2)
    # show the output frame
    cv2.imshow("Frame", frame)
    videowrite.write(frame)
    key = cv2.waitKey(1) & 0xFF

    # if the `q` key was pressed, break from the loop
    if key == ord("q"):
        break
videowrite.release()
# do a bit of cleanup
cv2.destroyAllWindows()
vs.release()

運行結果：

https://www.bilibili.com/video/BV19i4y197kh?spm_id_from=333.999.0.0
完整的代碼：
https://download.csdn.net/download/hhhhhhhhhhwwwwwwwwww/71355349

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/394111.html

標籤：其他

上一篇：GPS單點定位計算流程

下一篇：PHM壽命預測內容定稿

目標檢測進階：使用深度學習和 OpenCV 進行目標檢測

使用深度學習和 OpenCV 進行目標檢測

MobileNets：高效（深度）神經網路

使用 OpenCV 進行基于深度學習的物件檢測

使用 OpenCV 檢測視頻