處理幀數不等的視頻的批處理代碼-有解無憂

作者|Rahul Varma
編譯|VK
來源|Towards Data Science

訓練和測驗一個有效的機器學習模型最重要的一步是收集大量資料并使用這些資料對其進行有效訓練，小批量（Mini-batches）有助于解決這個問題，在每次迭代中使用一小部分資料進行訓練，

但是，隨著大量的機器學習任務在視頻資料集上執行，存在著對不等長視頻進行有效批處理的問題，大多數方法依賴于將視頻裁剪成相等的長度，以便在迭代期間提取相同數量的幀，但在我們需要從每一幀獲取資訊來有效地預測某些事情的場景中，這并不是特別有用，特別是在自動駕駛汽車和動作識別的情況下，

我們可以創建一個可以處理不同長度視頻的處理方法，

在Glenn Jocher的Yolov3中(https://github.com/ultralytics/yolov3)，我用LoadStreams作為基礎，創建了LoadStreamsBatch類，

類初始化

def __init__(self, sources='streams.txt', img_size=416, batch_size=2, subdir_search=False):
        self.mode = 'images'
        self.img_size = img_size
        self.def_img_size = None

        videos = []
        if os.path.isdir(sources):
            if subdir_search:
                for subdir, dirs, files in os.walk(sources):
                    for file in files:
                        if 'video' in magic.from_file(subdir + os.sep + file, mime=True):
                            videos.append(subdir + os.sep + file)
            else:
                for elements in os.listdir(sources):
                    if not os.path.isdir(elements) and 'video' in magic.from_file(sources + os.sep + elements, mime=True):
                        videos.append(sources + os.sep + elements)
        else:
            with open(sources, 'r') as f:
                videos = [x.strip() for x in f.read().splitlines() if len(x.strip())]

        n = len(videos)
        curr_batch = 0
        self.data = https://www.cnblogs.com/panchuangai/p/[None] * batch_size
        self.cap = [None] * batch_size
        self.sources = videos
        self.n = n
        self.cur_pos = 0

        # 啟動執行緒從視頻流中讀取幀
        for i, s in enumerate(videos):
            if curr_batch == batch_size:
                break
            print('%g/%g: %s... ' % (self.cur_pos+1, n, s), end='')
            self.cap[curr_batch] = cv2.VideoCapture(s)
            try:
                assert self.cap[curr_batch].isOpened()
            except AssertionError:
                print('Failed to open %s' % s)
                self.cur_pos+=1
                continue
            w = int(self.cap[curr_batch].get(cv2.CAP_PROP_FRAME_WIDTH))
            h = int(self.cap[curr_batch].get(cv2.CAP_PROP_FRAME_HEIGHT))
            fps = self.cap[curr_batch].get(cv2.CAP_PROP_FPS) % 100
            frames = int(self.cap[curr_batch].get(cv2.CAP_PROP_FRAME_COUNT))
            _, self.data[i] = self.cap[curr_batch].read()  # guarantee first frame
            thread = Thread(target=self.update, args=([i, self.cap[curr_batch], self.cur_pos+1]), daemon=True)
            print(' success (%gx%g at %.2f FPS having %g frames).' % (w, h, fps, frames))
            curr_batch+=1
            self.cur_pos+=1
            thread.start()
            print('')  # 新的一行

        if all( v is None for v in self.data ):
            return
        # 檢查常見形狀
        s = np.stack([letterbox(x, new_shape=self.img_size)[0].shape for x in self.data], 0)  # 推理的形狀
        self.rect = np.unique(s, axis=0).shape[0] == 1
        if not self.rect:
            print('WARNING: Different stream shapes detected. For optimal performance supply similarly-shaped streams.')

在__init__函式中，接受四個引數，雖然img_size與原始版本相同，但其他三個引數定義如下：

sources：它以目錄路徑或文本檔案作為輸入，
batch_size：所需的批大小
subdir_search：可以切換此選項，以確保在將目錄作為sources引數傳遞時搜索所有子目錄中的相關檔案

我首先檢查sources引數是目錄還是文本檔案，如果是一個目錄，我會讀取目錄中的所有內容（如果subdir_search引數為True，子目錄也會包括在內），否則我會讀取文本檔案中視頻的路徑，視頻的路徑存盤在串列中，使用cur_pos以跟蹤串列中的當前位置，

該串列以batch_size為最大值進行迭代，并檢查以跳過錯誤視頻或不存在的視頻，它們被發送到letterbox函式，以調整影像大小，這與原始版本相比沒有任何變化，除非所有視頻都有故障/不可用，

def letterbox(img, new_shape=(416, 416), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True):
    # 將影像調整為32個像素倍數的矩形 https://github.com/ultralytics/yolov3/issues/232
    shape = img.shape[:2]  # 當前形狀 [height, width]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)

    # 比例
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
    if not scaleup:  # 只按比例縮小，不按比例放大（用于更好的測驗圖）
        r = min(r, 1.0)

    # 計算填充
    ratio = r, r  # 寬高比
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  #填充
    if auto:  # 最小矩形
        dw, dh = np.mod(dw, 64), np.mod(dh, 64)  # 填充
    elif scaleFill:  # 伸展
        dw, dh = 0.0, 0.0
        new_unpad = new_shape
        ratio = new_shape[0] / shape[1], new_shape[1] / shape[0]  # 寬高比

    dw /= 2  # 將填充分成兩側
    dh /= 2

    if shape[::-1] != new_unpad:  # 改變大小
        img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=https://www.cnblogs.com/panchuangai/p/color)  # 添加邊界
    return img, ratio, (dw, dh)

固定間隔檢索幀函式

update函式有一個小的變化，我們另外存盤了默認的影像大小，以便在所有視頻都被提取進行處理，但由于長度不相等，一個視頻比另一個視頻提前完成，當我解釋代碼的下一部分時，它會更清楚，那就是__next__ 函式，

def update(self, index, cap, cur_pos):
        # 讀取守護行程執行緒中的下一個幀
        n = 0
        while cap.isOpened():
            n += 1
            # _, self.imgs[index] = cap.read()
            cap.grab()
            if n == 4:  # 每4幀讀取一次
                _, self.data[index] = cap.retrieve()
                if self.def_img_size is None:
                    self.def_img_size = self.data[index].shape
                n = 0
            time.sleep(0.01)  # 等待

迭代器

如果幀存在，它會像往常一樣傳遞給letterbox函式，在frame為None的情況下，這意味著視頻已被完全處理，我們檢查串列中的所有視頻是否都已被處理，如果有更多的視頻要處理，cur_pos指標用于獲取下一個可用視頻的位置，

如果不再從串列中提取視頻，但仍在處理某些視頻，則向其他處理組件發送一個空白幀，即，它根據其他批次中的剩余幀動態調整視頻大小，

def __next__(self):
        self.count += 1
        img0 = self.data.copy()
        img = []

        for i, x in enumerate(img0):
            if x is not None:
                img.append(letterbox(x, new_shape=self.img_size, auto=self.rect)[0])
            else:
                if self.cur_pos == self.n:
                    if all( v is None for v in img0 ):
                        cv2.destroyAllWindows()
                        raise StopIteration
                    else:
                        img0[i] = np.zeros(self.def_img_size)
                        img.append(letterbox(img0[i], new_shape=self.img_size, auto=self.rect)[0])
                else:
                    print('%g/%g: %s... ' % (self.cur_pos+1, self.n, self.sources[self.cur_pos]), end='')
                    self.cap[i] = cv2.VideoCapture(self.sources[self.cur_pos])
                    fldr_end_flg = 0
                    while not self.cap[i].isOpened():
                        print('Failed to open %s' % self.sources[self.cur_pos])
                        self.cur_pos+=1
                        if self.cur_pos == self.n:
                            img0[i] = np.zeros(self.def_img_size)
                            img.append(letterbox(img0[i], new_shape=self.img_size, auto=self.rect)[0])
                            fldr_end_flg = 1
                            break
                        self.cap[i] = cv2.VideoCapture(self.sources[self.cur_pos])
                    if fldr_end_flg:
                        continue
                    w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
                    h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
                    fps = cap.get(cv2.CAP_PROP_FPS) % 100
                    frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
                    _, self.data[i] = self.cap[i].read()  # 保證第一幀
                    img0[i] = self.data[i]
                    img.append(letterbox(self.data[i], new_shape=self.img_size, auto=self.rect)[0])
                    thread = Thread(target=self.update, args=([i, self.cap[i], self.cur_pos+1]), daemon=True)
                    print(' success (%gx%g at %.2f FPS having %g frames).' % (w, h, fps, frames))
                    self.cur_pos+=1
                    thread.start()
                    print('')  # 新的一行

        # 堆疊
        img = np.stack(img, 0)

        # 轉換
        img = img[:, :, :, ::-1].transpose(0, 3, 1, 2)  # BGR 到 RGB, bsx3x416x416
        img = np.ascontiguousarray(img)

        return self.sources, img, img0, None

結論

隨著大量的時間花費在資料收集和資料預處理上，我相信這有助于減少視頻與模型匹配的時間，我們可以集中精力使模型與資料相匹配，

我在這里附上完整的源代碼，希望這有幫助！

原文鏈接：https://towardsdatascience.com/variable-sized-video-mini-batching-c4b1a47c043b

歡迎關注磐創AI博客站：
http://panchuang.net/

sklearn機器學習中文官方檔案：
http://sklearn123.com/

歡迎關注磐創博客資源匯總站：
http://docs.panchuang.net/

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/79660.html

標籤：其他

上一篇：搭建k8s集群

下一篇：【推薦系統】：協同過濾簡介