動手實踐系列：CV語意分割！-有解無憂

Datawhale干貨

作者：游璐穎，福州大學，Datawhale成員

影像分割是計算機視覺中除了分類和檢測外的另一項基本任務，它意味著要將圖片根據內容分割成不同的塊，相比影像分類和檢測，分割是一項更精細的作業，因為需要對每個像素點分類，

如下圖的街景分割，由于對每個像素點都分類，物體的輪廓是精準勾勒的，而不是像檢測那樣給出邊界框，

影像分割可以分為以下三個子領域：語意分割、實體分割、全景分割，

由對比圖可發現，語意分割是從像素層次來識別影像，為影像中的每個像素制定類別標記，目前廣泛應用于醫學影像和無人駕駛等；實體分割相對更具有挑戰性，不僅需要正確檢測影像中的目標，同時還要精確的分割每個實體；全景分割綜合了兩個任務，要求影像中的每個像素點都必須被分配給一個語意標簽和一個實體id，

01 語意分割中的關鍵步驟

在進行網路訓練時，時常需要對語意標簽圖或是實體分割圖進行預處理，如對于一張彩色的標簽圖，通過顏色映射表得到每種顏色所代表的類別，再將其轉換成相應的掩膜或Onehot編碼完成訓練，這里將會對于其中的關鍵步驟進行講解，

首先，以語意分割任務為例，介紹標簽的不同表達形式，

1.1 語意標簽圖

語意分割資料集中包括原圖和語意標簽圖，兩者的尺寸大小相同，均為RGB影像，

在標簽影像中，白色和黑色分別代表邊框和背景，而其他不同顏色代表不同的類別：

1.2 單通道掩膜

每個標簽的RGB值與各自的標注類別對應，則可以很容易地查找標簽中每個像素的類別索引，生成單通道掩膜Mask，

如下面這種圖，標注類別包括：Person、Purse、Plants、Sidewalk、Building，將語意標簽圖轉換為單通道掩膜后為右圖所示，尺寸大小不變，但通道數由3變為1，

每個像素點位置一一對應，

1.3 Onehot編碼

Onehot作為一種編碼方式，可以對每一個單通道掩膜進行編碼，

比如對于上述掩膜圖Mask，影像尺寸為，標簽類別共有5類，我們需要將這個Mask變為一個5個通道的Onehot輸出，尺寸為，也就是將掩膜中值全為1的像素點抽取出生成一個圖，相應位置置為1，其余為0，再將全為2的抽取出再生成一個圖，相應位置置為1，其余為0，以此類推，

02 語意分割實踐

接下來以Pascal VOC 2012語意分割資料集為例，介紹不同表達形式之間應該如何相互轉換，

實踐采用的是Pascal VOC 2012語意分割資料集，它是語意分割任務中十分重要的資料集，有 20 類目標，這些目標包括人類、機動車類以及其他類，可用于目標類別或背景的分割，

資料集開源地址：

https://gas.graviti.cn/dataset/yluy/VOC2012Segmentation

2.1 資料集讀取

本次使用格物鈦資料平臺服務來完成資料集的在線讀取，平臺支持多種資料集型別，且提供了很多公開資料集便于使用，在使用之前先進行一些必要的準備作業：

Fork資料集：如果需要使用公開資料集，則需要將其先fork到自己的賬戶，
獲取AccessKey：獲取使用SDK與格物鈦資料平臺互動所需的密鑰，鏈接為https://gas.graviti.cn/tensorbay/developer
理解Segment：資料集的進一步劃分，如VOC資料集分成了“train”和“test”兩個部分，

import os
from tensorbay import GAS
from tensorbay.dataset import Data, Dataset
from tensorbay.label import InstanceMask, SemanticMask
from PIL import Image
import numpy as np
import torchvision
import matplotlib.pyplot as plt

ACCESS_KEY = "<YOUR_ACCESSKEY>"
gas = GAS(ACCESS_KEY)


def read_voc_images(is_train=True, index=0):
    """
    read voc image using tensorbay
    """
    dataset = Dataset("VOC2012Segmentation", gas)
    if is_train:
        segment = dataset["train"]
    else:
        segment = dataset["test"]

    data = segment[index]
    feature = Image.open(data.open()).convert("RGB")
    label = Image.open(data.label.semantic_mask.open()).convert("RGB")
    visualize(feature, label)

    return feature, label  # PIL Image


def visualize(feature, label):
    """
    visualize feature and label
    """
    fig = plt.figure()
    ax = fig.add_subplot(121)  # 第一個子圖
    ax.imshow(feature)
    ax2 = fig.add_subplot(122)  # 第二個子圖
    ax2.imshow(label)
    plt.show()

train_feature, train_label = read_voc_images(is_train=True, index=10)
train_label = np.array(train_label) # (375, 500, 3)

2.2 顏色映射表

在得到彩色語意標簽圖后，則可以構建一個顏色表映射，列出標簽中每個RGB顏色的值及其標注的類別，

def colormap_voc():
    """
    create a colormap
    """
    colormap = [[0, 0, 0], [128, 0, 0], [0, 128, 0], [128, 128, 0],
                    [0, 0, 128], [128, 0, 128], [0, 128, 128], [128, 128, 128],
                    [64, 0, 0], [192, 0, 0], [64, 128, 0], [192, 128, 0],
                    [64, 0, 128], [192, 0, 128], [64, 128, 128], [192, 128, 128],
                    [0, 64, 0], [128, 64, 0], [0, 192, 0], [128, 192, 0],
                    [0, 64, 128]]

    classes = ['background', 'aeroplane', 'bicycle', 'bird', 'boat',
                   'bottle', 'bus', 'car', 'cat', 'chair', 'cow',
                   'diningtable', 'dog', 'horse', 'motorbike', 'person',
                   'potted plant', 'sheep', 'sofa', 'train', 'tv/monitor']

    return colormap, classes

2.3 Label與Onehot轉換

根據映射表，實作語意標簽圖與Onehot編碼的相互轉換：

def label_to_onehot(label, colormap):
    """
    Converts a segmentation label (H, W, C) to (H, W, K) where the last dim is a one
    hot encoding vector, C is usually 1 or 3, and K is the number of class.
    """
    semantic_map = []
    for colour in colormap:
        equality = np.equal(label, colour)
        class_map = np.all(equality, axis=-1)
        semantic_map.append(class_map)
    semantic_map = np.stack(semantic_map, axis=-1).astype(np.float32)
    return semantic_map

def onehot_to_label(semantic_map, colormap):
    """
    Converts a mask (H, W, K) to (H, W, C)
    """
    x = np.argmax(semantic_map, axis=-1)
    colour_codes = np.array(colormap)
    label = np.uint8(colour_codes[x.astype(np.uint8)])
    return label

colormap, classes = colormap_voc()
semantic_map = mask_to_onehot(train_label, colormap)
print(semantic_map.shape)  # [H, W, K] = [375, 500, 21] 包括背景共21個類別

label = onehot_to_label(semantic_map, colormap)
print(label.shape) # [H, W, K] = [375, 500, 3]

2.4 Onehot與Mask轉換

同樣，借助映射表，實作單通道掩膜Mask與Onehot編碼的相互轉換：

def onehot2mask(semantic_map):
    """
    Converts a mask (K, H, W) to (H,W)
    """
    _mask = np.argmax(semantic_map, axis=0).astype(np.uint8)
    return _mask


def mask2onehot(mask, num_classes):
    """
    Converts a segmentation mask (H,W) to (K,H,W) where the last dim is a one
    hot encoding vector

    """
    semantic_map = [mask == i for i in range(num_classes)]
    return np.array(semantic_map).astype(np.uint8)

mask = onehot2mask(semantic_map.transpose(2,0,1))
print(np.unique(mask)) # [ 0  1 15] 索引相對應的是背景、飛機、人
print(mask.shape) # (375, 500)

semantic_map = mask2onehot(mask, len(colormap))
print(semantic_map.shape) # (21, 375, 500)

游璐穎

福州大學，datawhale成員

個人博客：https://sonatau.github.io

點擊閱讀原文直接獲取資料集

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/375078.html

標籤：其他

上一篇：OpenCV基礎API函式三

下一篇：python實作影像差異性分析（標記并記錄差異點）