摘要

近年來，研究表明，通過加入注意力機制可以有效地提高深度卷積神經網路的性能，本文提出了一種新的輕量級和有效的注意方法——金字塔拆分注意(PSA)模塊，通過用ResNet瓶頸塊中的PSA模塊代替3x3卷積，得到了一種名為有效金字塔拆分注意(EPSA)的新表示塊，EPSA模塊可以很容易地作為一個即插即用組件添加到一個完善的主干網路中，并可以實作模型性能的顯著改進，因此，通過堆疊這些ResNet-resnet風格的EPSA塊，開發了一個名為EPSANet的簡單而高效的主干體系結構，相應地，EPSANet可以為各種計算機視覺任務提供更強的多尺度表示能力，包括但不限于影像分類、目標檢測、實體分割等能力，如果沒有其他一些優化措施，所提議的EPSANet的性能優于大多數最先進的通道注意方法，與SENet-50相比，ImageNet資料集上的Top 1精度提高了1.93%，目標檢測的box 獲得+2.7AP，以及在MS-COCO資料集上比Mask-RCNN的分割精度提高了+1.7AP，

論文背景

注意機制廣泛應用于影像分類、目標檢測、實體分割、語意分割、場景分析和動作定位等許多計算機視覺領域，具體來說，注意方法有兩種型別，分別是通道注意和空間注意，最近，研究證明，通過使用通道注意、空間注意或兩者中的，可以實作顯著的性能改進，最常用的通道注意方法是SENet，它可以以相當低的成本顯著提高性能，SENet的缺點是它忽略了空間資訊的重要性，因此，提出BAM模塊和CBAM模塊，通過有效地結合空間和通道注意，來豐富注意力圖，然而，仍存在兩個重要而具有挑戰性的問題需要解決，第一個是如何有效地捕獲和利用不同尺度的特征地圖的空間資訊來豐富特征空間，第二種情況是，信道或空間注意力只能有效地捕獲區域資訊，但無法建立遠程信道依賴關系，

論文主要思想

PSA模塊：

通過使用不同大小的核卷積將輸入特征圖拆分成多尺度特征圖，并且在不同尺度上分別使用通道注意力SE模塊，讓網路關注不同尺度下的特征，然后，將不同尺度上的特征合并，使用SoftMax重新校準通道方面的注意力向量，得到多尺度通道的權重，最后，將多尺度的權重作用于相應的特征圖，得到一個更豐富的多尺度特征資訊的細化特征圖作為輸出，

keras實作

以下是根據論文和pytorch原始碼實作的keras版本(支持Tensorflow1.x)，


def _PSA(inputs, filters, name, conv_kernels=[3, 5, 7, 9], conv_groups=[1, 4, 8, 16]):
    assert len(conv_kernels) == len(conv_groups)
    split_num = len(conv_kernels)
    channel_axis = 1 if K.image_data_format() == 'channels_first' else -1
    in_dim = K.int_shape(inputs) # 計算輸入特征圖的通道數
    split_channel = filters // len(conv_kernels)
    mult_scals = []
    feature_se = []
    for i,  kernel_group in enumerate(zip(conv_kernels, conv_groups)):
        kernel, group = kernel_group
        feature = _group_conv(inputs, split_channel, kernel, 1, group) # group convolution
        mult_scals.append(feature)
        se_name = name + '_' + str(i)
        feature_se.append(_SE(feature, se_name))
    feature_attention = Concatenate(axis=channel_axis)(feature_se)
    features = Concatenate(axis=channel_axis)(mult_scals)
    if channel_axis == -1:
        attention_vectors = Reshape((1, 1, split_channel, split_num))(feature_attention)
        features = Reshape((in_dim[1], in_dim[2], split_channel, split_num))(features)
        attention_vectors = Activation('softmax')(attention_vectors)
        feats_weight = Multiply()([attention_vectors, features])
        feats_weight = Reshape((in_dim[1], in_dim[2], filters))(feats_weight)
    else:
        attention_vectors = Reshape((split_num, split_channel, 1, 1))(feature_attention)
        features = Reshape((split_num, split_channel, in_dim[3], in_dim[2]))(features)
        attention_vectors = Activation('softmax')(attention_vectors)
        feats_weight = Multiply()([attention_vectors, features])
        feats_weight = Reshape((filters, in_dim[3], in_dim[2]))(feats_weight)
    return feats_weight

def _SE(inputs, name, reduction=16):
    channel_axis = 1 if K.image_data_format() == 'channels_first' else -1
    out_dim = K.int_shape(inputs)[channel_axis]  # 計算輸入特征圖的通道數
    temp_dim = max(out_dim // reduction, reduction)

    squeeze = GlobalAvgPool2D(name=name+'_GlobalAvgPool2D')(inputs)
    if channel_axis == -1:
        excitation = Reshape((1, 1, out_dim), name=name + '_Reshape')(squeeze)
    else:
        excitation = Reshape((out_dim, 1, 1), name=name + '_Reshape')(squeeze)
    excitation = Conv2D(temp_dim, 1, 1, activation='relu', name=name + '_Conv2D_1')(excitation)
    excitation = Conv2D(out_dim, 1, 1, activation='sigmoid', name=name + '_Conv2D_2')(excitation)
    return excitation


def _group_conv(x, filters, kernel, stride, groups, padding='same'):
    channel_axis = 1 if K.image_data_format() == 'channels_first' else -1
    in_channels = K.int_shape(x)[channel_axis]  # 計算輸入特征圖的通道數
    nb_ig = in_channels // groups  # 對輸入特征圖通道進行分組
    nb_og = filters // groups  # 對輸出特征圖通道進行分組
    assert in_channels % groups == 0
    assert filters % groups == 0
    assert filters > groups
    gc_list = []
    for i in range(groups):
        if channel_axis == -1:
            x_group = Lambda(lambda z: z[:, :, :, i * nb_ig: (i + 1) * nb_ig])(x)
        else:
            x_group = Lambda(lambda z: z[:, i * nb_ig: (i + 1) * nb_ig, :, :])(x)
        gc_list.append(Conv2D(filters=nb_og, kernel_size=kernel, strides=stride,
                              padding=padding, use_bias=False)(x_group))
    return Concatenate(axis=channel_axis)(gc_list) if groups != 1 else gc_list[0]

宣告：本內容來源網路，著作權屬于原作者，圖片來源原論文，如有侵權，聯系洗掉，

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/291131.html

標籤：其他

上一篇：基于C語言的簡單小游戲-（掃雷）

下一篇：專案記錄一：用Python識別圖片中指定顏色標記塊并繪制其最小矩形框以及坐標點

2021 CVPR | EPSANet一種金字塔拆分注意力機制（keras實作）

摘要

論文背景

論文主要思想

PSA模塊：

keras實作