PointPillars是一個來自工業界的模型,整體思想基于圖片的處理框架,直接將點云從俯視圖的視角劃分為一個個的Pillar(立方柱體),從而構成了類似圖片的資料,然后在使用2D的檢測框架進行特征提取和密集的框預測得到檢測框,從而使得該模型在速度和精度都達到了一個很好的平衡,
PointPillars網路結構總覽:

網路速度精度對比:

注:(PP代表pointpillars,M代表MV3D, A代表AVOD,C代表ContFuse,V代表VoxelNet,
F代表Frustum Pointnet,S代表SECOND ,P+代表PIXOR++)
本文將會以OpenPCDet的代碼基礎,詳細決議PointPillars的每一行代碼實作以及原因,
讀者可以下載OpenPCDet后根據文章進行閱讀和理解,
由于本人才疏學淺,決議中難免會出現不足之處,歡迎指正、討論,有好的建議或意見都可以在評論區留言,謝謝大家!
PointPillars的論文地址為:
https://arxiv.org/pdf/1812.05784.pdf
決議參考代碼:
https://github.com/open-mmlab/OpenPCDet
注釋代碼倉庫:
https://github.com/Nathansong/OpenPCDdet-annotated
https://github.com/Nathansong/OpenPCDdet-annotated
一 : 綜述
3D檢測演算法通常有以下幾種形式:
(1)將點云資料劃納入一個個體素(Voxel)中,構成規則的、密集分布的體素集,如有VoxelNet和SECOND,
(2)從前視和俯視角度對點云資料進行投影映射處理,獲得一個個偽圖片的資料,常見的模型有MV3D和AVOD,
(3)直接將點云資料映射到鳥瞰圖后,再直接使用2D的檢測框架的處理方法進行特征提取和RPN,實作3D的檢測,如PIXOR、本文的主角pointpillar,
(4)使用pointnet直接從點云中對資料進行特征提取后獲取proposals,然后根據獲取的proposals進行微調,如Pointrcnn
二 : PP網路點云資料處理

這里的處理程序直接將3D的點云資訊直接從以俯視圖的形式進行獲取,在點云中假設有N*3個點的資訊,所有的這些點都在kitti lidar坐標系xyz中(單位是米,其中x向前,y向左,z向上),所有的這些點都會分配到均等大小的x-y平面的立方柱體中,這個立方柱就被稱為pillar,如下圖所示
左相機前視圖

點云俯視圖(左) 將點云分布到的均勻的立方柱體中(右)

(注:此處偷懶,沒有將點云轉換到FOV視角中,直接從3D點云俯視圖截圖,僅為pillar解釋)
kitti的點云資料是4維度的資料包含(x, y, z, r)其中xyz是改點在點云中的坐標,r代表了改點的反射強度(與物體材質和激光入射角度等有關);并且在將所有點放入每個pillar中的時候不需要像voxel那樣考慮高度,可以將一個pillar理解為就是一個z軸上所有voxel組成在一起的,
在進行PP的資料增強時候,需要對pillar中的資料進行增強操作,需要將每個pillar中的點增加5個維度的資料,包含 x c , y c , z c , x p 和 y p,其中下標c代表了每個點云到改點所對應pillar中所有點平均值的偏移量,下標p代表了該點距離所在pillar中心點的x,y的偏移量,所有經過資料增強操作后每個點的維度是9維;包含了x,y,z, x c , y c , z c , x p 和 y p(注在openpcdet的代碼實作中是10維,多了一個zp,也就是該點在z軸上與該點所處pillar的z軸中心的偏移量)
經過上述操作之后,就可以把原始的點云結構(N*3)變換成了(D,P,N),其中D代表了每個點云的特征維度,也就是每個點云9個特征,P代表了所有非空的立方柱體,N代表了每個pillar中最多會有多少個點,
注:
1、在實作的程序中,每個pillar的長寬是0.16米,在pcdet的實作中,我們只會截取前視圖的部分,進行訓練,因為kitti的標注是根據2號相機進行標注的,所有x軸的負方向(即車的后方)是沒有標注資料的,我們會截取掉后面的資料;同時為了保證檢測的可靠性,距離太遠的點,由于點云過于稀疏,也會被截取,所以在pcdet的實作中,點云空間的選取范圍xyz的最小值是=[0, -39.68,-3], xyz選取的最大值是[69.12, 39.68, 1],
2、其中每個pillar中的最大點云數量是32,如果一個pillar中的點云數量超過32,那么就會隨機采樣,選取32個點;如果一個pillar中的點云數量少于32;那么會對這個pillar使用0樣本填充,
在經過映射后,就獲得了一個(D,P,N)的張量;接下來這里使用了一個簡化版的pointnet網路對點云的資料進行特征提取(即將這些點通過MLP升維,然后跟著BN層和Relu激活層),得到一個(C,P,N)形狀的張量,之后再使用maxpool操作提取每個pillar中最能代表該pillar的點,那么輸出會變成(C,P,N)->(C,P);在經過上述操作編碼后的點,需要重新放回到原來對應pillar的x,y位置上生成偽圖象資料,
下面看這部分的代碼實作:
預處理實作代碼 pcdet/datasets/processor/data_processor.py
def transform_points_to_voxels(self, data_dict=None, config=None):
"""
將點云轉換為pillar,使用spconv的VoxelGeneratorV2
因為pillar可是認為是一個z軸上所有voxel的集合,所以在設定的時候,
只需要將每個voxel的高度設定成kitti中點云的最大高度即可
"""
#初始化點云轉換成pillar需要的引數
if data_dict is None:
# kitti截取的點云范圍是[0, -39.68, -3, 69.12, 39.68, 1]
# 得到[69.12, 79.36, 4]/[0.16, 0.16, 4] = [432, 496, 1]
grid_size = (self.point_cloud_range[3:6] - self.point_cloud_range[0:3]) / np.array(config.VOXEL_SIZE)
self.grid_size = np.round(grid_size).astype(np.int64)
self.voxel_size = config.VOXEL_SIZE
# just bind the config, we will create the VoxelGeneratorWrapper later,
# to avoid pickling issues in multiprocess spawn
return partial(self.transform_points_to_voxels, config=config)
if self.voxel_generator is None:
self.voxel_generator = VoxelGeneratorWrapper(
#給定每個pillar的大小 [0.16, 0.16, 4]
vsize_xyz=config.VOXEL_SIZE,
#給定點云的范圍 [0, -39.68, -3, 69.12, 39.68, 1]
coors_range_xyz=self.point_cloud_range,
#給定每個點云的特征維度,這里是x,y,z,r 其中r是激光雷達反射強度
num_point_features=self.num_point_features,
#給定每個pillar中最多能有多少個點 32
max_num_points_per_voxel=config.MAX_POINTS_PER_VOXEL,
#最多選取多少個pillar,因為生成的pillar中,很多都是沒有點在里面的
# 可以重上面的可視化影像中查看到,所以這里只需要得到那些非空的pillar就行
max_num_voxels=config.MAX_NUMBER_OF_VOXELS[self.mode], # 16000
)
points = data_dict['points']
# 生成pillar輸出
voxel_output = self.voxel_generator.generate(points)
# 假設一份點云資料是N*4,那么經過pillar生成后會得到三份資料
# voxels代表了每個生成的pillar資料,維度是[M,32,4]
# coordinates代表了每個生成的pillar所在的zyx軸坐標,維度是[M,3],其中z恒為0
# num_points代表了每個生成的pillar中有多少個有效的點維度是[m,],因為不滿32會被0填充
voxels, coordinates, num_points = voxel_output
if not data_dict['use_lead_xyz']:
voxels = voxels[..., 3:] # remove xyz in voxels(N, 3)
data_dict['voxels'] = voxels
data_dict['voxel_coords'] = coordinates
data_dict['voxel_num_points'] = num_points
return data_dict
# 下面是使用spconv生成pillar的代碼
class VoxelGeneratorWrapper():
def __init__(self, vsize_xyz, coors_range_xyz, num_point_features, max_num_points_per_voxel, max_num_voxels):
try:
from spconv.utils import VoxelGeneratorV2 as VoxelGenerator
self.spconv_ver = 1
except:
try:
from spconv.utils import VoxelGenerator
self.spconv_ver = 1
except:
from spconv.utils import Point2VoxelCPU3d as VoxelGenerator
self.spconv_ver = 2
if self.spconv_ver == 1:
self._voxel_generator = VoxelGenerator(
voxel_size=vsize_xyz,
point_cloud_range=coors_range_xyz,
max_num_points=max_num_points_per_voxel,
max_voxels=max_num_voxels
)
else:
self._voxel_generator = VoxelGenerator(
vsize_xyz=vsize_xyz,
coors_range_xyz=coors_range_xyz,
num_point_features=num_point_features,
max_num_points_per_voxel=max_num_points_per_voxel,
max_num_voxels=max_num_voxels
)
def generate(self, points):
if self.spconv_ver == 1:
voxel_output = self._voxel_generator.generate(points)
if isinstance(voxel_output, dict):
voxels, coordinates, num_points = \
voxel_output['voxels'], voxel_output['coordinates'], voxel_output['num_points_per_voxel']
else:
voxels, coordinates, num_points = voxel_output
else:
assert tv is not None, f"Unexpected error, library: 'cumm' wasn't imported properly."
voxel_output = self._voxel_generator.point_to_voxel(tv.from_numpy(points))
tv_voxels, tv_coordinates, tv_num_points = voxel_output
# make copy with numpy(), since numpy_view() will disappear as soon as the generator is deleted
voxels = tv_voxels.numpy()
coordinates = tv_coordinates.numpy()
num_points = tv_num_points.numpy()
return voxels, coordinates, num_points
在經過上面的預處理之后,就需要使用簡化版的pointnet網路對每個pillar中的資料進行特征提取了,
代碼在pcdet/models/backbones_3d/vfe/pillar_vfe.py
import torch
import torch.nn as nn
import torch.nn.functional as F
from .vfe_template import VFETemplate
class PFNLayer(nn.Module):
def __init__(self,
in_channels,
out_channels,
use_norm=True,
last_layer=False):
super().__init__()
self.last_vfe = last_layer
self.use_norm = use_norm
if not self.last_vfe:
out_channels = out_channels // 2
if self.use_norm:
# 根據論文中,這是是簡化版pointnet網路層的初始化
# 論文中使用的是 1x1 的卷積層完成這里的升維操作(理論上使用卷積的計算速度會更快)
# 輸入的通道數是剛剛經過資料增強過后的點云特征,每個點云有10個特征,
# 輸出的通道數是64
self.linear = nn.Linear(in_channels, out_channels, bias=False)
# 一維BN層
self.norm = nn.BatchNorm1d(out_channels, eps=1e-3, momentum=0.01)
else:
self.linear = nn.Linear(in_channels, out_channels, bias=True)
self.part = 50000
def forward(self, inputs):
if inputs.shape[0] > self.part:
# nn.Linear performs randomly when batch size is too large
num_parts = inputs.shape[0] // self.part
part_linear_out = [self.linear(inputs[num_part * self.part:(num_part + 1) * self.part])
for num_part in range(num_parts + 1)]
x = torch.cat(part_linear_out, dim=0)
else:
# x的維度由(M, 32, 10)升維成了(M, 32, 64)
x = self.linear(inputs)
torch.backends.cudnn.enabled = False
# BatchNorm1d層:(M, 64, 32) --> (M, 32, 64)
# (pillars,num_point,channel)->(pillars,channel,num_points)
# 這里之所以變換維度,是因為BatchNorm1d在通道維度上進行,對于影像來說默認模式為[N,C,H*W],通道在第二個維度上
x = self.norm(x.permute(0, 2, 1)).permute(0, 2, 1) if self.use_norm else x
torch.backends.cudnn.enabled = True
x = F.relu(x)
# 完成pointnet的最大池化操作,找出每個pillar中最能代表該pillar的點
# x_max shape :(M, 1, 64)
x_max = torch.max(x, dim=1, keepdim=True)[0]
if self.last_vfe:
# 回傳經過簡化版pointnet處理pillar的結果
return x_max
else:
x_repeat = x_max.repeat(1, inputs.shape[1], 1)
x_concatenated = torch.cat([x, x_repeat], dim=2)
return x_concatenated
class PillarVFE(VFETemplate):
"""
model_cfg:NAME: PillarVFE
WITH_DISTANCE: False
USE_ABSLOTE_XYZ: True
USE_NORM: True
NUM_FILTERS: [64]
num_point_features:4
voxel_size:[0.16 0.16 4]
POINT_CLOUD_RANGE: [0, -39.68, -3, 69.12, 39.68, 1]
"""
def __init__(self, model_cfg, num_point_features, voxel_size, point_cloud_range, **kwargs):
super().__init__(model_cfg=model_cfg)
self.use_norm = self.model_cfg.USE_NORM
self.with_distance = self.model_cfg.WITH_DISTANCE
self.use_absolute_xyz = self.model_cfg.USE_ABSLOTE_XYZ
num_point_features += 6 if self.use_absolute_xyz else 3
if self.with_distance:
num_point_features += 1
self.num_filters = self.model_cfg.NUM_FILTERS
assert len(self.num_filters) > 0
num_filters = [num_point_features] + list(self.num_filters)
pfn_layers = []
for i in range(len(num_filters) - 1):
in_filters = num_filters[i]
out_filters = num_filters[i + 1]
pfn_layers.append(
PFNLayer(in_filters, out_filters, self.use_norm, last_layer=(i >= len(num_filters) - 2))
)
# 加入線性層,將10維特征變為64維特征
self.pfn_layers = nn.ModuleList(pfn_layers)
self.voxel_x = voxel_size[0]
self.voxel_y = voxel_size[1]
self.voxel_z = voxel_size[2]
self.x_offset = self.voxel_x / 2 + point_cloud_range[0]
self.y_offset = self.voxel_y / 2 + point_cloud_range[1]
self.z_offset = self.voxel_z / 2 + point_cloud_range[2]
def get_output_feature_dim(self):
return self.num_filters[-1]
def get_paddings_indicator(self, actual_num, max_num, axis=0):
"""
計算padding的指示
Args:
actual_num:每個voxel實際點的數量(M,)
max_num:voxel最大點的數量(32,)
Returns:
paddings_indicator:表明一個pillar中哪些是真實資料,哪些是填充的0資料
"""
# 擴展一個維度,使變為(M,1)
actual_num = torch.unsqueeze(actual_num, axis + 1)
# [1, 1]
max_num_shape = [1] * len(actual_num.shape)
# [1, -1]
max_num_shape[axis + 1] = -1
# (1,32)
max_num = torch.arange(max_num, dtype=torch.int, device=actual_num.device).view(max_num_shape)
# (M, 32)
paddings_indicator = actual_num.int() > max_num
return paddings_indicator
def forward(self, batch_dict, **kwargs):
"""
batch_dict:
points:(N,5) --> (batch_index,x,y,z,r) batch_index代表了該點云資料在當前batch中的index
frame_id:(4,) --> (003877,001908,006616,005355) 幀ID
gt_boxes:(4,40,8)--> (x,y,z,dx,dy,dz,ry,class)
use_lead_xyz:(4,) --> (1,1,1,1)
voxels:(M,32,4) --> (x,y,z,r)
voxel_coords:(M,4) --> (batch_index,z,y,x) batch_index代表了該點云資料在當前batch中的index
voxel_num_points:(M,)
image_shape:(4,2) 每份點云資料對應的2號相機圖片解析度
batch_size:4 batch_size大小
"""
voxel_features, voxel_num_points, coords = batch_dict['voxels'], batch_dict['voxel_num_points'], batch_dict[
'voxel_coords']
# 求每個pillar中所有點云的和 (M, 32, 3)->(M, 1, 3) 設定keepdim=True的,則保留原來的維度資訊
# 然后在使用求和資訊除以每個點云中有多少個點來求每個pillar中所有點云的平均值 points_mean shape:(M, 1, 3)
points_mean = voxel_features[:, :, :3].sum(dim=1, keepdim=True) / voxel_num_points.type_as(voxel_features).view(
-1, 1, 1)
# 每個點云資料減去該點對應pillar的平均值得到差值 xc,yc,zc
f_cluster = voxel_features[:, :, :3] - points_mean
# 創建每個點云到該pillar的坐標中心點偏移量空資料 xp,yp,zp
f_center = torch.zeros_like(voxel_features[:, :, :3])
# coords是每個網格點的坐標,即[432, 496, 1],需要乘以每個pillar的長寬得到點云資料中實際的長寬(單位米)
# 同時為了獲得每個pillar的中心點坐標,還需要加上每個pillar長寬的一半得到中心點坐標
# 每個點的x、y、z減去對應pillar的坐標中心點,得到每個點到該點中心點的偏移量
f_center[:, :, 0] = voxel_features[:, :, 0] - (
coords[:, 3].to(voxel_features.dtype).unsqueeze(1) * self.voxel_x + self.x_offset)
f_center[:, :, 1] = voxel_features[:, :, 1] - (
coords[:, 2].to(voxel_features.dtype).unsqueeze(1) * self.voxel_y + self.y_offset)
# 此處偏移多了z軸偏移 論文中沒有z軸偏移
f_center[:, :, 2] = voxel_features[:, :, 2] - (
coords[:, 1].to(voxel_features.dtype).unsqueeze(1) * self.voxel_z + self.z_offset)
# 如果使用絕對坐標,直接組合
if self.use_absolute_xyz:
features = [voxel_features, f_cluster, f_center]
# 否則,取voxel_features的3維之后,在組合
else:
features = [voxel_features[..., 3:], f_cluster, f_center]
# 如果使用距離資訊
if self.with_distance:
# torch.norm的第一個2指的是求2范數,第二個2是在第三維度求范數
points_dist = torch.norm(voxel_features[:, :, :3], 2, 2, keepdim=True)
features.append(points_dist)
# 將特征在最后一維度拼接 得到維度為(M,32,10)的張量
features = torch.cat(features, dim=-1)
# 每個pillar中點云的最大數量
voxel_count = features.shape[1]
"""
由于在生成每個pillar中,不滿足最大32個點的pillar會存在由0填充的資料,
而剛才上面的計算中,會導致這些
由0填充的資料在計算出現xc,yc,zc和xp,yp,zp出現數值,
所以需要將這個被填充的資料的這些數值清0,
因此使用get_paddings_indicator計算features中哪些是需要被保留真實資料和需要被置0的填充資料
"""
# 得到mask維度是(M, 32)
# mask中指名了每個pillar中哪些是需要被保留的資料
mask = self.get_paddings_indicator(voxel_num_points, voxel_count, axis=0)
# (M, 32)->(M, 32, 1)
mask = torch.unsqueeze(mask, -1).type_as(voxel_features)
# 將feature中被填充資料的所有特征置0
features *= mask
for pfn in self.pfn_layers:
features = pfn(features)
# (M, 64), 每個pillar抽象出一個64維特征
features = features.squeeze()
batch_dict['pillar_features'] = features
return batch_dict
在經過簡化版的pointnet網路提取出每個pillar的特征資訊后,就需要將每個的pillar資料重新放回原來的坐標分布中來組成偽影像資料了,
代碼在pcdet/models/backbones_2d/map_to_bev/pointpillar_scatter.py
import torch
import torch.nn as nn
class PointPillarScatter(nn.Module):
"""
對應到論文中就是stacked pillars,將生成的pillar按照坐標索引還原到原空間中
"""
def __init__(self, model_cfg, grid_size, **kwargs):
super().__init__()
self.model_cfg = model_cfg
self.num_bev_features = self.model_cfg.NUM_BEV_FEATURES # 64
self.nx, self.ny, self.nz = grid_size # [432,496,1]
assert self.nz == 1
def forward(self, batch_dict, **kwargs):
"""
Args:
pillar_features:(M,64)
coords:(M, 4) 第一維是batch_index 其余維度為xyz
Returns:
batch_spatial_features:(batch_size, 64, 496, 432)
"""
# 拿到經過前面pointnet處理過后的pillar資料和每個pillar所在點云中的坐標位置
# pillar_features 維度 (M, 64)
# coords 維度 (M, 4)
pillar_features, coords = batch_dict['pillar_features'], batch_dict['voxel_coords']
# 將轉換成為偽影像的資料存在到該串列中
batch_spatial_features = []
batch_size = coords[:, 0].max().int().item() + 1
# batch中的每個資料獨立處理
for batch_idx in range(batch_size):
# 創建一個空間坐標所有用來接受pillar中的資料
# self.num_bev_features是64
# self.nz * self.nx * self.ny是生成的空間坐標索引 [496, 432, 1]的乘積
# spatial_feature 維度 (64,214272)
spatial_feature = torch.zeros(
self.num_bev_features,
self.nz * self.nx * self.ny,
dtype=pillar_features.dtype,
device=pillar_features.device) # (64,214272)-->1x432x496=214272
# 從coords[:, 0]取出該batch_idx的資料mask
batch_mask = coords[:, 0] == batch_idx
# 根據mask提取坐標
this_coords = coords[batch_mask, :]
# this_coords中存盤的坐標是z,y和x的形式,且只有一層,因此計算索引的方式如下
# 平鋪后需要計算前面有多少個pillar 一直到當前pillar的索引
"""
因為前面是將所有資料flatten成一維的了,相當于一個圖片寬高為[496, 432]的圖片
被flatten成一維的圖片資料了,變成了496*432=214272;
而this_coords中存盤的是平面(不需要考慮Z軸)中一個點的資訊,所以要
將這個點的位置放回被flatten的一位資料時,需要計算在該點之前所有行的點總和加上
該點所在的列即可
"""
# 這里得到所有非空pillar在偽影像的對應索引位置
indices = this_coords[:, 1] + this_coords[:, 2] * self.nx + this_coords[:, 3]
# 轉換資料型別
indices = indices.type(torch.long)
# 根據mask提取pillar_features
pillars = pillar_features[batch_mask, :]
pillars = pillars.t()
# 在索引位置填充pillars
spatial_feature[:, indices] = pillars
# 將空間特征加入list,每個元素為(64, 214272)
batch_spatial_features.append(spatial_feature)
# 在第0個維度將所有的資料堆疊在一起
batch_spatial_features = torch.stack(batch_spatial_features, 0)
# reshape回原空間(偽影像) (4, 64, 214272)--> (4, 64, 496, 432)
batch_spatial_features = batch_spatial_features.view(batch_size, self.num_bev_features * self.nz, self.ny,
self.nx)
# 將結果加入batch_dict
batch_dict['spatial_features'] = batch_spatial_features
return batch_dict
三、使用2D BackBone提取特征
經過上面的映射操作,將原來的pillar提取最大的數值后放回到相應的坐標后,就可以得到類似于影像的資料了;只有在有pillar非空的坐標處有提取的點云資料,其余地方都是0資料,所以得到的一個(batch_size,64, 432, 496)的張量還是很稀疏的,
下圖是對得到的張量資料使用2D中的特征提取手段進行多尺度的特征提取和拼接融合,
這沒有好決議的就是常規的卷積操作然后進行拼接即可,注意一下維度變換就可以,
最終經過所有上采樣層得到的3個尺度的的資訊 每個尺度的 shape 都是 (batch_size, 128, 248, 216) 在第一個維度上進行拼接得到x 維度是 (batch_size, 384, 248, 216)
代碼在pcdet/models/backbones_2d/base_bev_backbone.py
import numpy as np
import torch
import torch.nn as nn
class BaseBEVBackbone(nn.Module):
def __init__(self, model_cfg, input_channels):
super().__init__()
self.model_cfg = model_cfg
# 讀取下采樣層引數
if self.model_cfg.get('LAYER_NUMS', None) is not None:
assert len(self.model_cfg.LAYER_NUMS) == len(self.model_cfg.LAYER_STRIDES) == len(
self.model_cfg.NUM_FILTERS)
layer_nums = self.model_cfg.LAYER_NUMS
layer_strides = self.model_cfg.LAYER_STRIDES
num_filters = self.model_cfg.NUM_FILTERS
else:
layer_nums = layer_strides = num_filters = []
# 讀取上采樣層引數
if self.model_cfg.get('UPSAMPLE_STRIDES', None) is not None:
assert len(self.model_cfg.UPSAMPLE_STRIDES) == len(self.model_cfg.NUM_UPSAMPLE_FILTERS)
num_upsample_filters = self.model_cfg.NUM_UPSAMPLE_FILTERS
upsample_strides = self.model_cfg.UPSAMPLE_STRIDES
else:
upsample_strides = num_upsample_filters = []
num_levels = len(layer_nums) # 2
c_in_list = [input_channels, *num_filters[:-1]] # (256, 128) input_channels:256, num_filters[:-1]:64,128
self.blocks = nn.ModuleList()
self.deblocks = nn.ModuleList()
for idx in range(num_levels): # (64,64)-->(64,128)-->(128,256) # 這里為cur_layers的第一層且stride=2
cur_layers = [
nn.ZeroPad2d(1),
nn.Conv2d(
c_in_list[idx], num_filters[idx], kernel_size=3,
stride=layer_strides[idx], padding=0, bias=False
),
nn.BatchNorm2d(num_filters[idx], eps=1e-3, momentum=0.01),
nn.ReLU()
]
for k in range(layer_nums[idx]): # 根據layer_nums堆疊卷積層
cur_layers.extend([
nn.Conv2d(num_filters[idx], num_filters[idx], kernel_size=3, padding=1, bias=False),
nn.BatchNorm2d(num_filters[idx], eps=1e-3, momentum=0.01),
nn.ReLU()
])
# 在block中添加該層
# *作用是:將串列解開成幾個獨立的引數,傳入函式 # 類似的運算子還有兩個星號(**),是將字典解開成獨立的元素作為形參
self.blocks.append(nn.Sequential(*cur_layers))
if len(upsample_strides) > 0: # 構造上采樣層 # (1, 2, 4)
stride = upsample_strides[idx]
if stride >= 1:
self.deblocks.append(nn.Sequential(
nn.ConvTranspose2d(
num_filters[idx], num_upsample_filters[idx],
upsample_strides[idx],
stride=upsample_strides[idx], bias=False
),
nn.BatchNorm2d(num_upsample_filters[idx], eps=1e-3, momentum=0.01),
nn.ReLU()
))
else:
stride = np.round(1 / stride).astype(np.int)
self.deblocks.append(nn.Sequential(
nn.Conv2d(
num_filters[idx], num_upsample_filters[idx],
stride,
stride=stride, bias=False
),
nn.BatchNorm2d(num_upsample_filters[idx], eps=1e-3, momentum=0.01),
nn.ReLU()
))
c_in = sum(num_upsample_filters) # 512
if len(upsample_strides) > num_levels:
self.deblocks.append(nn.Sequential(
nn.ConvTranspose2d(c_in, c_in, upsample_strides[-1], stride=upsample_strides[-1], bias=False),
nn.BatchNorm2d(c_in, eps=1e-3, momentum=0.01),
nn.ReLU(),
))
self.num_bev_features = c_in
def forward(self, data_dict):
"""
Args:
data_dict:
spatial_features : (4, 64, 496, 432)
Returns:
"""
spatial_features = data_dict['spatial_features']
ups = []
ret_dict = {}
x = spatial_features
for i in range(len(self.blocks)):
x = self.blocks[i](x)
stride = int(spatial_features.shape[2] / x.shape[2])
ret_dict['spatial_features_%dx' % stride] = x
if len(self.deblocks) > 0: # (4,64,248,216)-->(4,128,124,108)-->(4,256,62,54)
ups.append(self.deblocks[i](x))
else:
ups.append(x)
# 如果存在上采樣層,將上采樣結果連接
if len(ups) > 1:
"""
最終經過所有上采樣層得到的3個尺度的的資訊
每個尺度的 shape 都是 (batch_size, 128, 248, 216)
在第一個維度上進行拼接得到x 維度是 (batch_size, 384, 248, 216)
"""
x = torch.cat(ups, dim=1)
elif len(ups) == 1:
x = ups[0]
# Fasle
if len(self.deblocks) > len(self.blocks):
x = self.deblocks[-1](x)
# 將結果存盤在spatial_features_2d中并回傳
data_dict['spatial_features_2d'] = x
return data_dict
四、檢測頭實作
PiontPillars中的檢測頭采用了類似SSD的檢測頭設定,在openpcdet的實作中,直接使用了一個網路訓練車、人、自行車三個類別;沒有像原論文中對車、人使用兩種不同的網路結構,因此在檢測頭的先驗框設定上,一共有三個類別的先驗框,每個先驗框都有兩個方向分別是BEV視角下的0度和90度,每個類別的先驗證只有一種尺度資訊;分別是車 [3.9, 1.6, 1.56]、人[0.8, 0.6, 1.73]、自行車[1.76, 0.6, 1.73](單位:米),
在anchor匹配GT的程序中,使用的是2D IOU匹配方式,直接從生成的特征圖也就是BEV視角進行匹配;不需要考慮高度資訊,原因有二:1、因為在kitti資料集中所有的物體都是在三維空間的同一個平面中的,沒有車在車上面的一個情況, 2、所有類別物體之間的高度差別不是很大,直接使用SmoothL1回歸就可以得到很好的結果, 其次是每個anchor被設定為正負樣本的iou閾值是:
車匹配iou閾值大于等于0.65為正樣本,小于0.45為負樣本,中間的不計算損失,
人匹配iou閾值大于等于0.5為正樣本,小于0.35為負樣本,中間的不計算損失,
自行車匹配iou閾值大于等于0.5為正樣本,小于0.35為負樣本,中間的不計算損失,
其中每個anchor都需要預測7個引數,分別是 (x, y, z, w, l, h, θ),其中x, y, z預測一個anchor的中心坐標在點云中的位置, w,l,h分別預測了一個anchor的長寬高資料,θ預測了box的偏移角度,
同時,因為在角度預測時候不可以區分兩個完全相反的box,所以PiontPillars的檢測頭中還添加了對一個anchor的方向預測;這里使用了一個基于softmax的方向分類box的兩個朝向資訊,
代碼在 pcdet/models/dense_heads/anchor_head_single.py
import numpy as np
import torch.nn as nn
from .anchor_head_template import AnchorHeadTemplate
class AnchorHeadSingle(AnchorHeadTemplate):
"""
Args:
model_cfg: AnchorHeadSingle的配置
input_channels: 384 輸入通道數
num_class: 3
class_names: ['Car','Pedestrian','Cyclist']
grid_size: (432, 496, 1)
point_cloud_range: (0, -39.68, -3, 69.12, 39.68, 1)
predict_boxes_when_training: False
"""
def __init__(self, model_cfg, input_channels, num_class, class_names, grid_size, point_cloud_range,
predict_boxes_when_training=True, **kwargs):
super().__init__(
model_cfg=model_cfg, num_class=num_class, class_names=class_names, grid_size=grid_size,
point_cloud_range=point_cloud_range,
predict_boxes_when_training=predict_boxes_when_training
)
# 每個點有3個尺度的個先驗框 每個先驗框都有兩個方向(0度,90度) num_anchors_per_location:[2, 2, 2]
self.num_anchors_per_location = sum(self.num_anchors_per_location) # sum([2, 2, 2])
# Conv2d(512,18,kernel_size=(1,1),stride=(1,1))
self.conv_cls = nn.Conv2d(
input_channels, self.num_anchors_per_location * self.num_class,
kernel_size=1
)
# Conv2d(512,42,kernel_size=(1,1),stride=(1,1))
self.conv_box = nn.Conv2d(
input_channels, self.num_anchors_per_location * self.box_coder.code_size,
kernel_size=1
)
# 如果存在方向損失,則添加方向卷積層Conv2d(512,12,kernel_size=(1,1),stride=(1,1))
if self.model_cfg.get('USE_DIRECTION_CLASSIFIER', None) is not None:
self.conv_dir_cls = nn.Conv2d(
input_channels,
self.num_anchors_per_location * self.model_cfg.NUM_DIR_BINS,
kernel_size=1
)
else:
self.conv_dir_cls = None
self.init_weights()
# 初始化引數
def init_weights(self):
pi = 0.01
# 初始化分類卷積偏置
nn.init.constant_(self.conv_cls.bias, -np.log((1 - pi) / pi))
# 初始化分類卷積權重
nn.init.normal_(self.conv_box.weight, mean=0, std=0.001)
def forward(self, data_dict):
# 從字典中取出經過backbone處理過的資訊
# spatial_features_2d 維度 (batch_size, 384, 248, 216)
spatial_features_2d = data_dict['spatial_features_2d']
# 每個坐標點上面6個先驗框的類別預測 --> (batch_size, 18, 200, 176)
cls_preds = self.conv_cls(spatial_features_2d)
# 每個坐標點上面6個先驗框的引數預測 --> (batch_size, 42, 200, 176) 其中每個先驗框需要預測7個引數,分別是(x, y, z, w, l, h, θ)
box_preds = self.conv_box(spatial_features_2d)
# 維度調整,將類別放置在最后一維度 [N, H, W, C] --> (batch_size, 200, 176, 18)
cls_preds = cls_preds.permute(0, 2, 3, 1).contiguous()
# 維度調整,將先驗框調整引數放置在最后一維度 [N, H, W, C] --> (batch_size ,200, 176, 42)
box_preds = box_preds.permute(0, 2, 3, 1).contiguous()
# 將類別和先驗框調整預測結果放入前向傳播字典中
self.forward_ret_dict['cls_preds'] = cls_preds
self.forward_ret_dict['box_preds'] = box_preds
# 進行方向分類預測
if self.conv_dir_cls is not None:
# # 每個先驗框都要預測為兩個方向中的其中一個方向 --> (batch_size, 12, 200, 176)
dir_cls_preds = self.conv_dir_cls(spatial_features_2d)
# 將類別和先驗框方向預測結果放到最后一個維度中 [N, H, W, C] --> (batch_size, 248, 216, 12)
dir_cls_preds = dir_cls_preds.permute(0, 2, 3, 1).contiguous()
# 將方向預測結果放入前向傳播字典中
self.forward_ret_dict['dir_cls_preds'] = dir_cls_preds
else:
dir_cls_preds = None
"""
如果是在訓練模式的時候,需要對每個先驗框分配GT來計算loss
"""
if self.training:
# targets_dict = {
# 'box_cls_labels': cls_labels, # (4,211200)
# 'box_reg_targets': bbox_targets, # (4,211200, 7)
# 'reg_weights': reg_weights # (4,211200)
# }
targets_dict = self.assign_targets(
gt_boxes=data_dict['gt_boxes'] # (4,39,8)
)
# 將GT分配結果放入前向傳播字典中
self.forward_ret_dict.update(targets_dict)
# 如果不是訓練模式,則直接生成進行box的預測
if not self.training or self.predict_boxes_when_training:
# 根據預測結果解碼生成最終結果
batch_cls_preds, batch_box_preds = self.generate_predicted_boxes(
batch_size=data_dict['batch_size'],
cls_preds=cls_preds, box_preds=box_preds, dir_cls_preds=dir_cls_preds
)
data_dict['batch_cls_preds'] = batch_cls_preds # (1, 211200, 3) 70400*3=211200
data_dict['batch_box_preds'] = batch_box_preds # (1, 211200, 7)
data_dict['cls_preds_normalized'] = False
return data_dict
五、loss計算
在Pointpillars的loss計算中,使用了與SECOND相同的loss計算方式,每個GT框都包含了 (x, y, z, w, l, h, θ)這7個引數,
1、loss理論計算
1.定位任務的回歸殘差定義如下:

其中x^gt代表了標注框的x長度 ;x^a代表了先驗框的長度資訊,d^a表示先驗框長度和寬度的對角線距離,定義為:
,
因此得到的總回歸損失是:
,
2.類別分類任務
對于每個先驗框的物體類別分類,PointPillars使用了focal loss,來完成調節正負樣本均衡,和難樣本挖掘,公式定義如下:
![]()
其中,aplha引數和gamma引數都和RetinaNet中的設定一樣,分別為0.25和2,
3.先驗框方向分類
由于在角度回歸的時候,不可以完全區分兩個兩個方向完全相反的預測框,所以在實作的時候,作者加入了對先驗框的方向分類,使用softmax函式預測方向的類別,
因此總損失定義如下:
其中,系數Beta_loc為2,Beta_cls為1,Beta_dir為0.2,
2、loss計算代碼實作
在loss計算的代碼實作中涉及的代碼量比較多,因此決議分為如下三個部分分別完成
1、先驗框的生成
2、GT和先驗框的匹配
3、loss計算實作
1、先驗框的生成
代碼在pcdet/models/dense_heads/target_assigner/anchor_generator.py
import torch
class AnchorGenerator(object):
def __init__(self, anchor_range, anchor_generator_config):
super().__init__()
self.anchor_generator_cfg = anchor_generator_config # list:3
# 得到anchor在點云中的分布范圍[0, -39.68, -3, 69.12, 39.68, 1]
self.anchor_range = anchor_range
# 得到配置引數中所有尺度anchor的長寬高
# list:3 --> 車、人、自行車[[[3.9, 1.6, 1.56]],[[0.8, 0.6, 1.73]],[[1.76, 0.6, 1.73]]]
self.anchor_sizes = [config['anchor_sizes'] for config in anchor_generator_config]
# 得到anchor的旋轉角度,這是是弧度,也就是0度和90度
# list:3 --> [[0, 1.57],[0, 1.57],[0, 1.57]]
self.anchor_rotations = [config['anchor_rotations'] for config in anchor_generator_config]
# 得到每個anchor初始化在點云中z軸的位置,其中在kitti中點云的z軸范圍是-3米到1米
# list:3 --> [[-1.78],[-0.6],[-0.6]]
self.anchor_heights = [config['anchor_bottom_heights'] for config in anchor_generator_config]
# 每個先驗框產生的時候是否需要在每個格子的中間,
# 例如坐標點為[1,1],如果需要對齊中心點的話,需要加上0.5變成[1.5, 1.5]
# 默認為False
# list:3 --> [False, False, False]
self.align_center = [config.get('align_center', False) for config in anchor_generator_config]
assert len(self.anchor_sizes) == len(self.anchor_rotations) == len(self.anchor_heights)
self.num_of_anchor_sets = len(self.anchor_sizes) # 3
def generate_anchors(self, grid_sizes):
assert len(grid_sizes) == self.num_of_anchor_sets
# 1.初始化
all_anchors = []
num_anchors_per_location = []
# 2.三個類別的先驗框逐類別生成
for grid_size, anchor_size, anchor_rotation, anchor_height, align_center in zip(
grid_sizes, self.anchor_sizes, self.anchor_rotations, self.anchor_heights, self.align_center):
# 2 = 2x1x1 --> 每個位置產生2個anchor,這里的2代表兩個方向
num_anchors_per_location.append(len(anchor_rotation) * len(anchor_size) * len(anchor_height))
# 不需要對齊中心點來生成先驗框
if align_center:
x_stride = (self.anchor_range[3] - self.anchor_range[0]) / grid_size[0]
y_stride = (self.anchor_range[4] - self.anchor_range[1]) / grid_size[1]
# 中心對齊,平移半個網格
x_offset, y_offset = x_stride / 2, y_stride / 2
else:
# 2.1計算每個網格的在點云空間中的實際大小
# 用于將每個anchor映射回實際點云中的大小
# (69.12 - 0) / (216 - 1) = 0.3214883848678234 單位:米
x_stride = (self.anchor_range[3] - self.anchor_range[0]) / (grid_size[0] - 1)
# (39.68 - (-39.68.)) / (248 - 1) = 0.3212955490297634 單位:米
y_stride = (self.anchor_range[4] - self.anchor_range[1]) / (grid_size[1] - 1)
# 由于沒有進行中心對齊,所有每個點相對于左上角坐標的偏移量都是0
x_offset, y_offset = 0, 0
# 2.2 生成單個維度x_shifts,y_shifts和z_shifts
# 以x_stride為step,在self.anchor_range[0] + x_offset和self.anchor_range[3] + 1e-5,
# 產生x坐標 --> 216個點 [0, 69.12]
x_shifts = torch.arange(
self.anchor_range[0] + x_offset, self.anchor_range[3] + 1e-5, step=x_stride, dtype=torch.float32,
).cuda()
# 產生y坐標 --> 248個點 [0, 79.36]
y_shifts = torch.arange(
self.anchor_range[1] + y_offset, self.anchor_range[4] + 1e-5, step=y_stride, dtype=torch.float32,
).cuda()
"""
new_tensor函式可以回傳一個新的張量資料,該張量資料與指定的有相同的屬性
如擁有相同的資料型別和張量所在的設備情況等屬性;
并使用anchor_height數值個來填充這個張量
"""
# [-1.78]
z_shifts = x_shifts.new_tensor(anchor_height)
# num_anchor_size = 1
# num_anchor_rotation = 2
num_anchor_size, num_anchor_rotation = anchor_size.__len__(), anchor_rotation.__len__() # 1, 2
# [0, 1.57] 弧度制
anchor_rotation = x_shifts.new_tensor(anchor_rotation)
# [[3.9, 1.6, 1.56]]
anchor_size = x_shifts.new_tensor(anchor_size)
# 2.3 呼叫meshgrid生成網格坐標
x_shifts, y_shifts, z_shifts = torch.meshgrid([
x_shifts, y_shifts, z_shifts
])
# meshgrid可以理解為在原來的維度上進行擴展,例如:
# x原來為(216,)-->(216,1, 1)--> (216,248,1)
# y原來為(248,)--> (1,248,1)--> (216,248,1)
# z原來為 (1, ) --> (1,1,1) --> (216,248,1)
# 2.4.anchor各個維度堆疊組合,生成最終anchor(1,432,496,1,2,7)
# 2.4.1.堆疊anchor的位置
# [x, y, z, 3]-->[216, 248, 1, 3] 代表了每個anchor的位置資訊
# 其中3為該點所在映射tensor中的(z, y, x)數值
anchors = torch.stack((x_shifts, y_shifts, z_shifts), dim=-1)
# 2.4.2.將anchor的位置和大小進行組合,編程為將anchor擴展并復制為相同維度(除了最后一維),然后進行組合
# (216, 248, 1, 3) --> (216, 248, 1 , 1, 3)
# 維度分別代表了: z,y,x, 該類別anchor的尺度數量,該個anchor的位置資訊
anchors = anchors[:, :, :, None, :].repeat(1, 1, 1, anchor_size.shape[0], 1)
# (1, 1, 1, 1, 3) --> (216, 248, 1, 1, 3)
anchor_size = anchor_size.view(1, 1, 1, -1, 3).repeat([*anchors.shape[0:3], 1, 1])
# anchors生成的最終結果需要有位置資訊和大小資訊 --> (216, 248, 1, 1, 6)
# 最后一個緯度中表示(z, y, x, l, w, h)
anchors = torch.cat((anchors, anchor_size), dim=-1)
# 2.4.3.將anchor的位置和大小和旋轉角進行組合
# 在倒數第二個維度上增加一個維度,然后復制該維度一次
# (216, 248, 1, 1, 2, 6) 長, 寬, 深, anchor尺度數量, 該尺度旋轉角個數,anchor的6個引數
anchors = anchors[:, :, :, :, None, :].repeat(1, 1, 1, 1, num_anchor_rotation, 1)
# (216, 248, 1, 1, 2, 1) 兩個不同方向先驗框的旋轉角度
anchor_rotation = anchor_rotation.view(1, 1, 1, 1, -1, 1).repeat(
[*anchors.shape[0:3], num_anchor_size, 1, 1])
# [z, y, x, num_size, num_rot, 7] --> (216, 248, 1, 1, 2, 7)
# 最后一個緯度表示為anchors的位置+大小+旋轉角度(z, y, x, l, w, h, theta)
anchors = torch.cat((anchors, anchor_rotation), dim=-1) # [z, y, x, num_size, num_rot, 7]
# 2.5 置換anchor的維度
# [z, y, x, num_anchor_size, num_rot, 7]-->[x, y, z, num_anchor_zie, num_rot, 7]
# 最后一個緯度代表了 : [x, y, z, dx, dy, dz, rot]
anchors = anchors.permute(2, 1, 0, 3, 4, 5).contiguous()
# 使得各類anchor的z軸方向從anchor的底部移動到該anchor的中心點位置
# 車 : -1.78 + 1.56/2 = -1.0
# 人、自行車 : -0.6 + 1.73/2 = 0.23
anchors[..., 2] += anchors[..., 5] / 2
all_anchors.append(anchors)
# all_anchors: [(1,248,216,1,2,7),(1,248,216,1,2,7),(1,248,216,1,2,7)]
# num_anchors_per_location:[2,2,2]
return all_anchors, num_anchors_per_location
2、GT和先驗框的匹配(target assignment)
此處代碼注釋已經寫得很詳細,可以按照注釋理解如果和計算GT和所有anchor的匹配;
assign_targets完成對一幀點云資料中所有的類別和anchor的正負樣本分配,
assign_targets_single完成對一幀中每個類別的GT和anchor的正負樣本分配,
所以一個Batch樣本中anchor與GT的匹配這里是逐幀逐類別進行的,與影像目標檢測中稍有不同,
代碼在pcdet/models/dense_heads/target_assigner/axis_aligned_target_assigner.py
import numpy as np
import torch
from ....ops.iou3d_nms import iou3d_nms_utils
from ....utils import box_utils
class AxisAlignedTargetAssigner(object):
def __init__(self, model_cfg, class_names, box_coder, match_height=False):
super().__init__()
# anchor生成配置引數
anchor_generator_cfg = model_cfg.ANCHOR_GENERATOR_CONFIG
# 為預測box找對應anchor的引數
anchor_target_cfg = model_cfg.TARGET_ASSIGNER_CONFIG
# 編碼box的7個殘差引數(x, y, z, w, l, h, θ) --> pcdet.utils.box_coder_utils.ResidualCoder
self.box_coder = box_coder
# 在PointPillars中指定正負樣本的時候由BEV視角計算GT和先驗框的iou,不需要進行z軸上的高度的匹配,
# 想法是:1、點云中的物體都在同一個平面上,沒有物體在Z軸發生重疊的情況
# 2、每個類別的高度相差不是很大,直接使用SmoothL1損失就可以達到很好的高度回歸效果
self.match_height = match_height
# 類別名稱['Car', 'Pedestrian', 'Cyclist']
self.class_names = np.array(class_names)
# ['Car', 'Pedestrian', 'Cyclist']
self.anchor_class_names = [config['class_name'] for config in anchor_generator_cfg]
# anchor_target_cfg.POS_FRACTION = -1 < 0 --> None
# 前景、背景采樣系數 PointPillars不考慮
self.pos_fraction = anchor_target_cfg.POS_FRACTION if anchor_target_cfg.POS_FRACTION >= 0 else None
# 總采樣數 PointPillars不考慮
self.sample_size = anchor_target_cfg.SAMPLE_SIZE # 512
# False 前景權重由 1/前景anchor數量 PointPillars不考慮
self.norm_by_num_examples = anchor_target_cfg.NORM_BY_NUM_EXAMPLES
# 類別iou匹配為正樣本閾值{'Car':0.6, 'Pedestrian':0.5, 'Cyclist':0.5}
self.matched_thresholds = {}
# 類別iou匹配為負樣本閾值{'Car':0.45, 'Pedestrian':0.35, 'Cyclist':0.35}
self.unmatched_thresholds = {}
for config in anchor_generator_cfg:
self.matched_thresholds[config['class_name']] = config['matched_threshold']
self.unmatched_thresholds[config['class_name']] = config['unmatched_threshold']
self.use_multihead = model_cfg.get('USE_MULTIHEAD', False) # False
# self.separate_multihead = model_cfg.get('SEPARATE_MULTIHEAD', False)
# if self.seperate_multihead:
# rpn_head_cfgs = model_cfg.RPN_HEAD_CFGS
# self.gt_remapping = {}
# for rpn_head_cfg in rpn_head_cfgs:
# for idx, name in enumerate(rpn_head_cfg['HEAD_CLS_NAME']):
# self.gt_remapping[name] = idx + 1
def assign_targets(self, all_anchors, gt_boxes_with_classes):
"""
處理一批資料中所有點云的anchors和gt_boxes,
計算每個anchor屬于前景還是背景,
為每個前景的anchor分配類別和計算box的回歸殘差和回歸權重
Args:
all_anchors: [(N, 7), ...]
gt_boxes_with_classes: (B, M, 8) # 最后維度資料為 (x, y, z, w, l, h, θ,class)
Returns:
all_targets_dict = {
# 每個anchor的類別
'box_cls_labels': cls_labels, # (batch_size,num_of_anchors)
# 每個anchor的回歸殘差 -->(?x, ?y, ?z, ?l, ?w, ?h, ?θ)
'box_reg_targets': bbox_targets, # (batch_size,num_of_anchors,7)
# 每個box的回歸權重
'reg_weights': reg_weights # (batch_size,num_of_anchors)
}
"""
# 1.初始化結果list并提取對應的gt_box和類別
bbox_targets = []
cls_labels = []
reg_weights = []
# 得到批大小
batch_size = gt_boxes_with_classes.shape[0] # 4
# 得到所有GT的類別
gt_classes = gt_boxes_with_classes[:, :, -1] # (4,num_of_gt)
# 得到所有GT的7個box引數
gt_boxes = gt_boxes_with_classes[:, :, :-1] # (4,num_of_gt,7)
# 2.對batch中的所有資料逐幀匹配anchor的前景和背景
for k in range(batch_size):
cur_gt = gt_boxes[k] # 取出當前幀中的 gt_boxes (num_of_gt,7)
"""
由于在OpenPCDet的資料預處理時,以一批資料中擁有GT數量最多的幀為基準,
其他幀中GT數量不足,則會進行補0操作,使其成為一個矩陣,例:
[
[1,1,2,2,3,2],
[2,2,3,1,0,0],
[3,1,2,0,0,0]
]
因此這里從每一行的倒數第二個類別開始判斷,
截取最后一個非零元素的索引,來取出當前幀中真實的GT資料
"""
cnt = cur_gt.__len__() - 1 # 得到一批資料中最多有多少個GT
# 這里的回圈是找到最后一個非零的box,因為預處理的時候會按照batch最大box的數量處理,不足的進行補0
while cnt > 0 and cur_gt[cnt].sum() == 0:
cnt -= 1
# 2.1提取當前幀非零的box和類別
cur_gt = cur_gt[:cnt + 1]
# cur_gt_classes 例: tensor([1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3], device='cuda:0', dtype=torch.int32)
cur_gt_classes = gt_classes[k][:cnt + 1].int()
target_list = []
# 2.2 對每幀中的anchor和GT分類別,單獨計算前背景
# 計算時候 每個類別的anchor是獨立計算的 不同于在ssd中整體計算iou并取最大值
for anchor_class_name, anchors in zip(self.anchor_class_names, all_anchors):
# anchor_class_name : 車 | 行人 | 自行車
# anchors : (1, 200, 176, 1, 2, 7) 7 --> (x, y, z, l, w, h, θ)
if cur_gt_classes.shape[0] > 1:
# self.class_names : ["car", "person", "cyclist"]
# 這里減1是因為串列索引從0開始,目的是得到屬于串列中gt中哪些類別是與當前處理的了類別相同,得到類別mask
mask = torch.from_numpy(self.class_names[cur_gt_classes.cpu() - 1] == anchor_class_name)
else:
mask = torch.tensor([self.class_names[c - 1] == anchor_class_name
for c in cur_gt_classes], dtype=torch.bool)
# 在檢測頭中是否使用多頭,是的話 此處為True,默認為False
if self.use_multihead: # False
anchors = anchors.permute(3, 4, 0, 1, 2, 5).contiguous().view(-1, anchors.shape[-1])
# if self.seperate_multihead:
# selected_classes = cur_gt_classes[mask].clone()
# if len(selected_classes) > 0:
# new_cls_id = self.gt_remapping[anchor_class_name]
# selected_classes[:] = new_cls_id
# else:
# selected_classes = cur_gt_classes[mask]
selected_classes = cur_gt_classes[mask]
else:
# 2.2.1 計算所需的變數 得到特征圖的大小
feature_map_size = anchors.shape[:3] # (1, 248, 216)
# 將所有的anchors展平 shape : (216, 248, 1, 1, 2, 7) --> (107136, 7)
anchors = anchors.view(-1, anchors.shape[-1])
# List: 根據累唄mask索引得到該幀中當前需要處理的類別 --> 車 | 行人 | 自行車
selected_classes = cur_gt_classes[mask]
# 2.2.2 使用assign_targets_single來單獨為某一類別的anchors分配gt_boxes,
# 并為前景、背景的box設定編碼和回歸權重
single_target = self.assign_targets_single(
anchors, # 該類的所有anchor
cur_gt[mask], # GT_box shape : (num_of_GT_box, 7)
gt_classes=selected_classes, # 當前選中的類別
matched_threshold=self.matched_thresholds[anchor_class_name], # 當前類別anchor與GT匹配為正樣本的閾值
unmatched_threshold=self.unmatched_thresholds[anchor_class_name] # 當前類別anchor與GT匹配為負樣本的閾值
)
target_list.append(single_target)
# 到目前為止,處理完該幀單個類別和該類別anchor的前景和背景分配
if self.use_multihead:
target_dict = {
'box_cls_labels': [t['box_cls_labels'].view(-1) for t in target_list],
'box_reg_targets': [t['box_reg_targets'].view(-1, self.box_coder.code_size) for t in target_list],
'reg_weights': [t['reg_weights'].view(-1) for t in target_list]
}
target_dict['box_reg_targets'] = torch.cat(target_dict['box_reg_targets'], dim=0)
target_dict['box_cls_labels'] = torch.cat(target_dict['box_cls_labels'], dim=0).view(-1)
target_dict['reg_weights'] = torch.cat(target_dict['reg_weights'], dim=0).view(-1)
else:
target_dict = {
# feature_map_size:(1,200,176, 2)
'box_cls_labels': [t['box_cls_labels'].view(*feature_map_size, -1) for t in target_list],
# (1,248,216, 2, 7)
'box_reg_targets': [t['box_reg_targets'].view(*feature_map_size, -1, self.box_coder.code_size)
for t in target_list],
# (1,248,216, 2)
'reg_weights': [t['reg_weights'].view(*feature_map_size, -1) for t in target_list]
}
# list : 3*anchor (1, 248, 216, 2, 7) --> (1, 248, 216, 6, 7) -> (321408, 7)
target_dict['box_reg_targets'] = torch.cat(
target_dict['box_reg_targets'], dim=-2
).view(-1, self.box_coder.code_size)
# list:3 (1, 248, 216, 2) --> (1,248, 216, 6) -> (1*248*216*6, )
target_dict['box_cls_labels'] = torch.cat(target_dict['box_cls_labels'], dim=-1).view(-1)
# list:3 (1, 200, 176, 2) --> (1, 200, 176, 6) -> (1*248*216*6, )
target_dict['reg_weights'] = torch.cat(target_dict['reg_weights'], dim=-1).view(-1)
# 將結果填入對應的容器
bbox_targets.append(target_dict['box_reg_targets'])
cls_labels.append(target_dict['box_cls_labels'])
reg_weights.append(target_dict['reg_weights'])
# 到這里該batch的點云全部處理完
# 3.將結果stack并回傳
bbox_targets = torch.stack(bbox_targets, dim=0) # (batch_size,321408,7)
cls_labels = torch.stack(cls_labels, dim=0) # (batch_size,321408)
reg_weights = torch.stack(reg_weights, dim=0) # (batch_size,321408)
all_targets_dict = {
'box_cls_labels': cls_labels, # (batch_size,321408)
'box_reg_targets': bbox_targets, # (batch_size,321408,7)
'reg_weights': reg_weights # (batch_size,321408)
}
return all_targets_dict
def assign_targets_single(self, anchors, gt_boxes, gt_classes, matched_threshold=0.6, unmatched_threshold=0.45):
"""
針對某一類別的anchors和gt_boxes,計算前景和背景anchor的類別,box編碼和回歸權重
Args:
anchors: (107136, 7)
gt_boxes: (該幀中該類別的GT數量,7)
gt_classes: (該幀中該類別的GT數量, 1)
matched_threshold:0.6
unmatched_threshold:0.45
Returns:
前景anchor
ret_dict = {
'box_cls_labels': labels, # (107136,)
'box_reg_targets': bbox_targets, # (107136,7)
'reg_weights': reg_weights, # (107136,)
}
"""
# ----------------------------1.初始化-------------------------------#
num_anchors = anchors.shape[0] # 216 * 248 = 107136
num_gt = gt_boxes.shape[0] # 該幀中該類別的GT數量
# 初始化anchor對應的label和gt_id ,并置為 -1,-1表示loss計算時候不會被考慮,背景的類別被設定為0
labels = torch.ones((num_anchors,), dtype=torch.int32, device=anchors.device) * -1
gt_ids = torch.ones((num_anchors,), dtype=torch.int32, device=anchors.device) * -1
# ---------------------2.計算該類別中anchor的前景和背景------------------------#
if len(gt_boxes) > 0 and anchors.shape[0] > 0:
# 1.計算該幀中某一個類別gt和對應anchors之間的iou(jaccard index)
# anchor_by_gt_overlap shape : (107136, num_gt)
# anchor_by_gt_overlap代表當前類別的所有anchor和當前類別中所有GT的iou
anchor_by_gt_overlap = iou3d_nms_utils.boxes_iou3d_gpu(anchors[:, 0:7], gt_boxes[:, 0:7]) \
if self.match_height else box_utils.boxes3d_nearest_bev_iou(anchors[:, 0:7], gt_boxes[:, 0:7])
# NOTE: The speed of these two versions depends the environment and the number of anchors
# anchor_to_gt_argmax = torch.from_numpy(anchor_by_gt_overlap.cpu().numpy().argmax(axis=1)).cuda()
# 2.得到每一個anchor與哪個的GT的的iou最大
# anchor_to_gt_argmax表示資料維度是anchor的長度,索引是gt
anchor_to_gt_argmax = anchor_by_gt_overlap.argmax(dim=1)
# anchor_to_gt_max得到每一個anchor最匹配的gt的iou數值
anchor_to_gt_max = anchor_by_gt_overlap[
torch.arange(num_anchors, device=anchors.device), anchor_to_gt_argmax]
# gt_to_anchor_argmax = torch.from_numpy(anchor_by_gt_overlap.cpu().numpy().argmax(axis=0)).cuda()
# 3.找到每個gt最匹配anchor的索引和iou
# (num_of_gt,) 得到每個gt最匹配的anchor索引
gt_to_anchor_argmax = anchor_by_gt_overlap.argmax(dim=0)
# (num_of_gt,)找到每個gt最匹配anchor的iou
gt_to_anchor_max = anchor_by_gt_overlap[gt_to_anchor_argmax, torch.arange(num_gt, device=anchors.device)]
# 4.將GT中沒有匹配到的anchor的iou數值設定為-1
empty_gt_mask = gt_to_anchor_max == 0 # 得到沒有匹配到anchor的gt的mask
gt_to_anchor_max[empty_gt_mask] = -1 # 將沒有匹配到anchor的gt的iou數值設定為-1
# 5.找到anchor中和gt存在最大iou的anchor索引,即前景anchor
"""
由于在前面的實作中,僅僅找出來每個GT和anchor的最大iou索引,但是argmax回傳的是索引最小的那個,
在匹配的程序中可能一個GT和多個anchor擁有相同的iou大小,
所以此處要找出這個GT與所有anchors擁有相同最大iou的anchor
"""
# 以gt為基礎,逐個anchor對應,比如第一個gt的最大iou為0.9,則在所有anchor中找iou為0.9的anchor
# nonzero函式是numpy中用于得到陣列array中非零元素的位置(陣列索引)的函式
"""
矩陣比較例子 :
anchors_with_max_overlap = torch.tensor([[0.78, 0.1, 0.9, 0],
[0.0, 0.5, 0, 0],
[0.0, 0, 0.9, 0.8],
[0.78, 0.1, 0.0, 0]])
gt_to_anchor_max = torch.tensor([0.78, 0.5, 0.9,0.8])
anchors_with_max_overlap = anchor_by_gt_overlap == gt_to_anchor_max
# 回傳的結果中包含了在anchor中與該GT擁有相同最大iou的所有anchor
anchors_with_max_overlap = tensor([[ True, False, True, False],
[False, True, False, False],
[False, False, True, True],
[ True, False, False, False]])
在torch中nonzero回傳的是tensor中非0元素的位置,此函式在numpy中回傳的是非零元素的行串列和列串列,
torch回傳結果tensor([[0, 0],
[0, 2],
[1, 1],
[2, 2],
[2, 3],
[3, 0]])
numpy回傳結果(array([0, 0, 1, 2, 2, 3]), array([0, 2, 1, 2, 3, 0]))
所以可以得到第一個GT同時與第一個anchor和最后一個anchor最為匹配
"""
"""所以在實際的一批資料中可以到得到結果為
tensor([[33382, 9],
[43852, 10],
[47284, 5],
[50370, 4],
[58498, 8],
[58500, 8],
[58502, 8],
[59139, 2],
[60751, 1],
[61183, 1],
[61420, 11],
[62389, 0],
[63216, 13],
[63218, 13],
[65046, 12],
[65048, 12],
[65478, 12],
[65480, 12],
[71924, 3],
[78046, 7],
[80150, 6]], device='cuda:0')
在第0維度擁有相同gt索引的項,在該類所有anchor中同時擁有多個與之最為匹配的anchor
"""
# (num_of_multiple_best_matching_for_per_GT,)
anchors_with_max_overlap = (anchor_by_gt_overlap == gt_to_anchor_max).nonzero()[:, 0]
# 得到這些最匹配anchor與該類別的哪個GT索引相對應
# 其實和(anchor_by_gt_overlap == gt_to_anchor_max).nonzero()[:, 1]的結果一樣
gt_inds_force = anchor_to_gt_argmax[anchors_with_max_overlap] # (35,)
# 將gt的類別賦值到對應的anchor的label中
labels[anchors_with_max_overlap] = gt_classes[gt_inds_force]
# 將gt的索引也賦值到對應的anchors的gt_ids中
gt_ids[anchors_with_max_overlap] = gt_inds_force.int()
# 6.根據matched_threshold和unmatched_threshold以及anchor_to_gt_max計算前景和背景索引,并更新labels和gt_ids
"""這里對labels和gt_ids的操作應該已經包含了上面的anchors_with_max_overlap"""
# 找到最匹配的anchor中iou大于給定閾值的mask #(107136,)
pos_inds = anchor_to_gt_max >= matched_threshold
# 找到最匹配的anchor中iou大于給定閾值的gt的索引 #(105,)
gt_inds_over_thresh = anchor_to_gt_argmax[pos_inds]
# 將pos anchor對應gt的類別賦值到對應的anchor的label中
labels[pos_inds] = gt_classes[gt_inds_over_thresh]
# 將pos anchor對應gt的索引賦值到對應的anchor的gt_id中
gt_ids[pos_inds] = gt_inds_over_thresh.int()
bg_inds = (anchor_to_gt_max < unmatched_threshold).nonzero()[:, 0] # 找到背景anchor索引
else:
bg_inds = torch.arange(num_anchors, device=anchors.device)
# 找到前景anchor的索引--> (num_of_foreground_anchor,)
# 106879 + 119 = 106998 < 107136 說明有一些anchor既不是背景也不是前景,
# iou介于unmatched_threshold和matched_threshold之間
fg_inds = (labels > 0).nonzero()[:, 0]
# 到目前為止得到哪些anchor是前景和哪些anchor是背景
# ------------------3.對anchor的前景和背景進行篩選和賦值--------------------#
# 如果存在前景采樣比例,則分別采樣前景和背景anchor,PointPillar中沒有前背景采樣操作,前背景均衡使用了focal loss損失函式
if self.pos_fraction is not None: # anchor_target_cfg.POS_FRACTION = -1 < 0 --> None
num_fg = int(self.pos_fraction * self.sample_size) # self.sample_size=512
# 如果前景anchor大于采樣前景數
if len(fg_inds) > num_fg:
# 計算要丟棄的前景anchor數目
num_disabled = len(fg_inds) - num_fg
# 在前景數目中隨機產生索引值,并取前num_disabled個關閉索引
# 比如:torch.randperm(4)
# 輸出:tensor([ 2, 1, 0, 3])
disable_inds = torch.randperm(len(fg_inds))[:num_disabled]
# 將被丟棄的anchor的iou設定為-1
labels[disable_inds] = -1
# 更新前景索引
fg_inds = (labels > 0).nonzero()[:, 0]
# 計算所需背景數
num_bg = self.sample_size - (labels > 0).sum()
# 如果當前背景數大于所需背景數
if len(bg_inds) > num_bg:
# torch.randint在0到len(bg_inds)之間,隨機產生size為(num_bg,)的陣列
enable_inds = bg_inds[torch.randint(0, len(bg_inds), size=(num_bg,))]
# 將enable_inds的標簽設定為0
labels[enable_inds] = 0
# bg_inds = torch.nonzero(labels == 0)[:, 0]
else:
# 如果該類別沒有GT的話,將該類別的全部label置0,即所有anchor都是背景類別
if len(gt_boxes) == 0 or anchors.shape[0] == 0:
labels[:] = 0
else:
# anchor與GT的iou小于unmatched_threshold的anchor的類別設定類背景類別
labels[bg_inds] = 0
# 將前景賦對應類別
"""
此處分別使用了anchors_with_max_overlap和
anchor_to_gt_max >= matched_threshold來對該類別的anchor進行賦值
但是我個人覺得anchor_to_gt_max >= matched_threshold已經包含了anchors_with_max_overlap的那些與GT擁有最大iou的
anchor了,所以我對這里的計算方式有一點好奇,為什么要分別計算兩次,
如果知道這里原因的小伙伴希望可以給予解答,謝謝!
"""
labels[anchors_with_max_overlap] = gt_classes[gt_inds_force]
# ------------------4.計算bbox_targets和reg_weights--------------------#
# 初始化每個anchor的7個回歸引數,并設定為0數值
bbox_targets = anchors.new_zeros((num_anchors, self.box_coder.code_size)) # (107136,7)
# 如果該幀中有該類別的GT時候,就需要對這些設定為正樣本類別的anchor進行編碼操作了
if len(gt_boxes) > 0 and anchors.shape[0] > 0:
# 使用anchor_to_gt_argmax[fg_inds]來重復索引每個anchor對應前景的GT_box
fg_gt_boxes = gt_boxes[anchor_to_gt_argmax[fg_inds], :]
# 提取所有屬于前景的anchor
fg_anchors = anchors[fg_inds, :]
"""
PointPillar編碼gt和前景anchor,并賦值到bbox_targets的對應位置
7個引數的編碼的方式為
?x = (x^gt ? xa^da)/d^a , ?y = (y^gt ? ya^da)/d^a , ?z = (z^gt ? za^ha)/h^a
?w = log (w^gt / w^a) ?l = log (l^gt / l^a) , ?h = log (h^gt / h^a)
?θ = sin(θ^gt - θ^a)
"""
bbox_targets[fg_inds, :] = self.box_coder.encode_torch(fg_gt_boxes, fg_anchors)
# 初始化回歸權重,并設定值為0
reg_weights = anchors.new_zeros((num_anchors,)) # (107136,)
if self.norm_by_num_examples: # PointPillars回歸權重中不需要norm_by_num_examples
num_examples = (labels >= 0).sum()
num_examples = num_examples if num_examples > 1.0 else 1.0
reg_weights[labels > 0] = 1.0 / num_examples
else:
reg_weights[labels > 0] = 1.0 # 將前景anchor的回歸權重設定為1
ret_dict = {
'box_cls_labels': labels, # (107136,)
'box_reg_targets': bbox_targets, # (107136,7)編碼后的結果
'reg_weights': reg_weights, # (107136,)
}
return ret_dict
3、box編碼實作
此處根據論文中的公式對匹配被正樣本的anchor_box和與之對應的GT-box的7個回歸引數進行編碼,
編碼公式:

其中x^gt代表了標注框的x長度 ;x^a代表了先驗框的長度資訊,d^a表示先驗框長度和寬度的對角線距離,定義為:
,
代碼在:pcdet/utils/box_coder_utils.py
class ResidualCoder(object):
def __init__(self, code_size=7, encode_angle_by_sincos=False, **kwargs):
"""
loss中anchor和gt的編碼與解碼
7個引數的編碼的方式為
?x = (x^gt ? xa^da)/d^a , ?y = (y^gt ? ya^da)/d^a , ?z = (z^gt ? za^ha)/h^a
?w = log (w^gt / w^a) ?l = log (l^gt / l^a) , ?h = log (h^gt / h^a)
?θ = sin(θ^gt - θ^a)
"""
super().__init__()
self.code_size = code_size
self.encode_angle_by_sincos = encode_angle_by_sincos
if self.encode_angle_by_sincos:
self.code_size += 1
def encode_torch(self, boxes, anchors):
"""
Args:
boxes: (N, 7 + C) [x, y, z, dx, dy, dz, heading, ...]
anchors: (N, 7 + C) [x, y, z, dx, dy, dz, heading or *[cos, sin], ...]
Returns:
"""
# 截斷anchors的[dx,dy,dz],每個anchor_box的l, w, h數值如果小于1e-5則為1e-5
anchors[:, 3:6] = torch.clamp_min(anchors[:, 3:6], min=1e-5)
# 截斷boxes的[dx,dy,dz] 每個GT_box的l, w, h數值如果小于1e-5則為1e-5
boxes[:, 3:6] = torch.clamp_min(boxes[:, 3:6], min=1e-5)
# If split_size_or_sections is an integer type, then tensor will be split into equally sized chunks (if possible).
# Last chunk will be smaller if the tensor size along the given dimension dim is not divisible by split_size.
# 這里指torch.split的第二個引數 torch.split(tensor, split_size, dim=) split_size是切分后每塊的大小,不是切分為多少塊!,多余的引數使用*cags接收
xa, ya, za, dxa, dya, dza, ra, *cas = torch.split(anchors, 1, dim=-1)
xg, yg, zg, dxg, dyg, dzg, rg, *cgs = torch.split(boxes, 1, dim=-1)
# 計算anchor對角線長度
diagonal = torch.sqrt(dxa ** 2 + dya ** 2)
# 計算loss的公式,Δx,Δy,Δz,Δw,Δl,Δh,Δθ
# ?x = x ^ gt ? xa ^ da
xt = (xg - xa) / diagonal
# ?y = (y^gt ? ya^da)/d^a
yt = (yg - ya) / diagonal
# ?z = (z^gt ? za^ha)/h^a
zt = (zg - za) / dza
# ?l = log(l ^ gt / l ^ a)
dxt = torch.log(dxg / dxa)
# ?w = log(w ^ gt / w ^ a)
dyt = torch.log(dyg / dya)
# ?h = log(h ^ gt / h ^ a)
dzt = torch.log(dzg / dza)
# False
if self.encode_angle_by_sincos:
rt_cos = torch.cos(rg) - torch.cos(ra)
rt_sin = torch.sin(rg) - torch.sin(ra)
rts = [rt_cos, rt_sin]
else:
rts = [rg - ra] # Δθ
cts = [g - a for g, a in zip(cgs, cas)]
return torch.cat([xt, yt, zt, dxt, dyt, dzt, *rts, *cts], dim=-1)
4、loss計算實作
在PointPillars損失計算分別有三個,每個anhcor和GT的類別分類損失、box的7個回歸損失、還有一個方向角預測的分類損失構成,
1、分類損失計算:
代碼在pcdet/models/dense_heads/anchor_head_template.py
def get_cls_layer_loss(self):
# (batch_size, 248, 216, 18) 網路類別預測
cls_preds = self.forward_ret_dict['cls_preds']
# (batch_size, 321408) 前景anchor類別
box_cls_labels = self.forward_ret_dict['box_cls_labels']
batch_size = int(cls_preds.shape[0])
# [batch_szie, num_anchors]--> (batch_size, 321408)
# 關心的anchor 選取出前景背景anchor, 在0.45到0.6之間的設定為仍然是-1,不參與loss計算
cared = box_cls_labels >= 0
# (batch_size, 321408) 前景anchor
positives = box_cls_labels > 0
# (batch_size, 321408) 背景anchor
negatives = box_cls_labels == 0
# 背景anchor賦予權重
negative_cls_weights = negatives * 1.0
# 將每個anchor分類的損失權重都設定為1
cls_weights = (negative_cls_weights + 1.0 * positives).float()
# 每個正樣本anchor的回歸損失權重,設定為1
reg_weights = positives.float()
# 如果只有一類
if self.num_class == 1:
# class agnostic
box_cls_labels[positives] = 1
# 正則化并計算權重 求出每個資料中有多少個正例,即shape=(batch, 1)
pos_normalizer = positives.sum(1, keepdim=True).float() # (4,1) 所有正例的和 eg:[[162.],[166.],[155.],[108.]]
# 正則化回歸損失-->(batch_size, 321408),最小值為1,根據論文中所述,用正樣本數量來正則化回歸損失
reg_weights /= torch.clamp(pos_normalizer, min=1.0)
# 正則化分類損失-->(batch_size, 321408),根據論文中所述,用正樣本數量來正則化分類損失
cls_weights /= torch.clamp(pos_normalizer, min=1.0)
# care包含了背景和前景的anchor,但是這里只需要得到前景部分的類別即可不關注-1和0
# cared.type_as(box_cls_labels) 將cared中為False的那部分不需要計算loss的anchor變成了0
# 對應位置相乘后,所有背景和iou介于match_threshold和unmatch_threshold之間的anchor都設定為0
cls_targets = box_cls_labels * cared.type_as(box_cls_labels)
# 在最后一個維度擴展一次
cls_targets = cls_targets.unsqueeze(dim=-1)
cls_targets = cls_targets.squeeze(dim=-1)
one_hot_targets = torch.zeros(
*list(cls_targets.shape), self.num_class + 1, dtype=cls_preds.dtype, device=cls_targets.device
) # (batch_size, 321408, 4),這里的類別數+1是考慮背景
# target.scatter(dim, index, src)
# scatter_函式的一個典型應用就是在分類問題中,
# 將目標標簽轉換為one-hot編碼形式 https://blog.csdn.net/guofei_fly/article/details/104308528
# 這里表示在最后一個維度,將cls_targets.unsqueeze(dim=-1)所索引的位置設定為1
"""
dim=1: 表示按照列進行填充
index=batch_data.label:表示把batch_data.label里面的元素值作為下標,
去下標對應位置(這里的"對應位置"解釋為列,如果dim=0,那就解釋為行)進行填充
src=1:表示填充的元素值為1
"""
# (batch_size, 321408, 4)
one_hot_targets.scatter_(-1, cls_targets.unsqueeze(dim=-1).long(), 1.0)
# (batch_size, 248, 216, 18) --> (batch_size, 321408, 3)
cls_preds = cls_preds.view(batch_size, -1, self.num_class)
# (batch_size, 321408, 3) 不計算背景分類損失
one_hot_targets = one_hot_targets[..., 1:]
# 計算分類損失 # [N, M] # (batch_size, 321408, 3)
cls_loss_src = self.cls_loss_func(cls_preds, one_hot_targets, weights=cls_weights)
# 求和并除以batch數目
cls_loss = cls_loss_src.sum() / batch_size
# loss乘以分類權重 --> cls_weight=1.0
cls_loss = cls_loss * self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['cls_weight']
tb_dict = {
'rpn_loss_cls': cls_loss.item()
}
return cls_loss, tb_dict
與之對應的focal_loss分類計算的詳細實作代碼在:pcdet/utils/loss_utils.py
class SigmoidFocalClassificationLoss(nn.Module):
"""
多分類
Sigmoid focal cross entropy loss.
"""
def __init__(self, gamma: float = 2.0, alpha: float = 0.25):
"""
Args:
gamma: Weighting parameter to balance loss for hard and easy examples.
alpha: Weighting parameter to balance loss for positive and negative examples.
"""
super(SigmoidFocalClassificationLoss, self).__init__()
self.alpha = alpha # 0.25
self.gamma = gamma # 2.0
@staticmethod
def sigmoid_cross_entropy_with_logits(input: torch.Tensor, target: torch.Tensor):
""" PyTorch Implementation for tf.nn.sigmoid_cross_entropy_with_logits:
max(x, 0) - x * z + log(1 + exp(-abs(x))) in
https://www.tensorflow.org/api_docs/python/tf/nn/sigmoid_cross_entropy_with_logits
Args:
input: (B, #anchors, #classes) float tensor.
Predicted logits for each class
target: (B, #anchors, #classes) float tensor.
One-hot encoded classification targets
Returns:
loss: (B, #anchors, #classes) float tensor.
Sigmoid cross entropy loss without reduction
"""
loss = torch.clamp(input, min=0) - input * target + \
torch.log1p(torch.exp(-torch.abs(input)))
return loss
def forward(self, input: torch.Tensor, target: torch.Tensor, weights: torch.Tensor):
"""
Args:
input: (B, #anchors, #classes) float tensor. eg:(4, 321408, 3)
Predicted logits for each class :一個anchor會預測三種類別
target: (B, #anchors, #classes) float tensor. eg:(4, 321408, 3)
One-hot encoded classification targets,:真值
weights: (B, #anchors) float tensor. eg:(4, 321408)
Anchor-wise weights.
Returns:
weighted_loss: (B, #anchors, #classes) float tensor after weighting.
"""
pred_sigmoid = torch.sigmoid(input) # (batch_size, 321408, 3) f(x) = 1 / (1 + e^(-x))
# 這里的加權主要是解決正負樣本不均衡的問題:正樣本的權重為0.25,負樣本的權重為0.75
# 交叉熵來自KL散度,衡量兩個分布之間的相似性,針對二分類問題:
# 合并形式: L = -(y * log(y^) + (1 - y) * log(1 - y^)) <-->
# 分段形式:y = 1, L = -y * log(y^); y = 0, L = -(1 - y) * log(1 - y^)
# 這兩種形式等價,只要是0和1的分類問題均可以寫成兩種等價形式,針對focal loss做類似處理
# 相對熵 = 資訊熵 + 交叉熵, 且交叉熵是凸函式,求導時能夠得到全域最優值-->(sigma(s)- y)x https://zhuanlan.zhihu.com/p/35709485
alpha_weight = target * self.alpha + (1 - target) * (1 - self.alpha) # (4, 321408, 3)
pt = target * (1.0 - pred_sigmoid) + (1.0 - target) * pred_sigmoid
focal_weight = alpha_weight * torch.pow(pt, self.gamma)
# (batch_size, 321408, 3) 交叉熵損失的一種變形,具體推到參考上面的鏈接
bce_loss = self.sigmoid_cross_entropy_with_logits(input, target)
loss = focal_weight * bce_loss # (batch_size, 321408, 3)
if weights.shape.__len__() == 2 or \
(weights.shape.__len__() == 1 and target.shape.__len__() == 2):
weights = weights.unsqueeze(-1)
assert weights.shape.__len__() == loss.shape.__len__()
# weights引數使用正anchor數目進行平均,使得每個樣本的損失與樣本中目標的數量無關
return loss * weights
2、box的回歸SmoothL1損失計算和方向分類損失計算:
代碼在:pcdet/models/dense_heads/anchor_head_template.py
def get_box_reg_layer_loss(self):
# (batch_size, 248, 216, 42) anchor_box的7個回歸引數
box_preds = self.forward_ret_dict['box_preds']
# (batch_size, 248, 216, 12) anchor_box的方向預測
box_dir_cls_preds = self.forward_ret_dict.get('dir_cls_preds', None)
# (batch_size, 321408, 7) 每個anchor和GT編碼的結果
box_reg_targets = self.forward_ret_dict['box_reg_targets']
# (batch_size, 321408)
box_cls_labels = self.forward_ret_dict['box_cls_labels']
batch_size = int(box_preds.shape[0])
# 獲取所有anchor中屬于前景anchor的mask shape : (batch_size, 321408)
positives = box_cls_labels > 0
# 設定回歸引數為1. [True, False] * 1. = [1., 0.]
reg_weights = positives.float() # (4, 211200) 只保留標簽>0的值
# 同cls處理
pos_normalizer = positives.sum(1,
keepdim=True).float() # (batch_size, 1) 所有正例的和 eg:[[162.],[166.],[155.],[108.]]
reg_weights /= torch.clamp(pos_normalizer, min=1.0) # (batch_size, 321408)
if isinstance(self.anchors, list):
if self.use_multihead:
anchors = torch.cat(
[anchor.permute(3, 4, 0, 1, 2, 5).contiguous().view(-1, anchor.shape[-1]) for anchor in
self.anchors], dim=0)
else:
anchors = torch.cat(self.anchors, dim=-3) # (1, 248, 216, 3, 2, 7)
else:
anchors = self.anchors
# (1, 248*216, 7) --> (batch_size, 248*216, 7)
anchors = anchors.view(1, -1, anchors.shape[-1]).repeat(batch_size, 1, 1)
# (batch_size, 248*216, 7)
box_preds = box_preds.view(batch_size, -1,
box_preds.shape[-1] // self.num_anchors_per_location if not self.use_multihead else
box_preds.shape[-1])
# sin(a - b) = sinacosb-cosasinb
# (batch_size, 321408, 7)
box_preds_sin, reg_targets_sin = self.add_sin_difference(box_preds, box_reg_targets)
loc_loss_src = self.reg_loss_func(box_preds_sin, reg_targets_sin, weights=reg_weights)
loc_loss = loc_loss_src.sum() / batch_size
loc_loss = loc_loss * self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['loc_weight'] # loc_weight = 2.0 損失乘以回歸權重
box_loss = loc_loss
tb_dict = {
# pytorch中的item()方法,回傳張量中的元素值,與python中針對dict的item方法不同
'rpn_loss_loc': loc_loss.item()
}
# 如果存在方向預測,則添加方向損失
if box_dir_cls_preds is not None:
# (batch_size, 321408, 2)
dir_targets = self.get_direction_target(
anchors, box_reg_targets,
dir_offset=self.model_cfg.DIR_OFFSET, # 方向偏移量 0.78539 = π/4
num_bins=self.model_cfg.NUM_DIR_BINS # BINS的方向數 = 2
)
# 方向預測值 (batch_size, 321408, 2)
dir_logits = box_dir_cls_preds.view(batch_size, -1, self.model_cfg.NUM_DIR_BINS)
# 只要正樣本的方向預測值 (batch_size, 321408)
weights = positives.type_as(dir_logits)
# (4, 211200) 除正例數量,使得每個樣本的損失與樣本中目標的數量無關
weights /= torch.clamp(weights.sum(-1, keepdim=True), min=1.0)
# 方向損失計算
dir_loss = self.dir_loss_func(dir_logits, dir_targets, weights=weights)
dir_loss = dir_loss.sum() / batch_size
# 損失權重,dir_weight: 0.2
dir_loss = dir_loss * self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['dir_weight']
# 將方向損失加入box損失
box_loss += dir_loss
tb_dict['rpn_loss_dir'] = dir_loss.item()
return box_loss, tb_dict
smoothL1計算
代碼在pcdet/utils/loss_utils.py
class WeightedSmoothL1Loss(nn.Module):
"""
Code-wise Weighted Smooth L1 Loss modified based on fvcore.nn.smooth_l1_loss
https://github.com/facebookresearch/fvcore/blob/master/fvcore/nn/smooth_l1_loss.py
| 0.5 * x ** 2 / beta if abs(x) < beta
smoothl1(x) = |
| abs(x) - 0.5 * beta otherwise,
where x = input - target.
"""
def __init__(self, beta: float = 1.0 / 9.0, code_weights: list = None):
"""
Args:
beta: Scalar float.
L1 to L2 change point.
For beta values < 1e-5, L1 loss is computed.
code_weights: (#codes) float list if not None.
Code-wise weights.
"""
super(WeightedSmoothL1Loss, self).__init__()
self.beta = beta # 默認值1/9=0.111
if code_weights is not None:
self.code_weights = np.array(code_weights, dtype=np.float32) # [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
self.code_weights = torch.from_numpy(self.code_weights).cuda() # 將權重放到GPU上
@staticmethod
def smooth_l1_loss(diff, beta):
# 如果beta非常小,則直接用abs計算,否則按照正常的Smooth L1 Loss計算
if beta < 1e-5:
loss = torch.abs(diff)
else:
n = torch.abs(diff) # (batch_size, 321408, 7)
# smoothL1公式,如上面所示 --> (batch_size, 321408, 7)
loss = torch.where(n < beta, 0.5 * n ** 2 / beta, n - 0.5 * beta)
return loss
def forward(self, input: torch.Tensor, target: torch.Tensor, weights: torch.Tensor = None):
"""
Args:
input: (B, #anchors, #codes) float tensor.
Ecoded predicted locations of objects.
target: (B, #anchors, #codes) float tensor.
Regression targets.
weights: (B, #anchors) float tensor if not None.
Returns:
loss: (B, #anchors) float tensor.
Weighted smooth l1 loss without reduction.
"""
# 如果target為nan,則等于input,否則等于target
target = torch.where(torch.isnan(target), input, target) # ignore nan targets# (batch_size, 321408, 7)
diff = input - target # (batch_size, 321408, 7)
# code-wise weighting
if self.code_weights is not None:
diff = diff * self.code_weights.view(1, 1, -1) #(batch_size, 321408, 7) 乘以box每一項的權重
loss = self.smooth_l1_loss(diff, self.beta)
# anchor-wise weighting
if weights is not None:
assert weights.shape[0] == loss.shape[0] and weights.shape[1] == loss.shape[1]
# weights引數使用正anchor數目進行平均,使得每個樣本的損失與樣本中目標的數量無關
loss = loss * weights.unsqueeze(-1)
return loss
方向分類損失計算:
代碼在pcdet/utils/loss_utils.py
class WeightedCrossEntropyLoss(nn.Module):
"""
二分類
Transform input to fit the formation of PyTorch official cross entropy loss
with anchor-wise weighting.
"""
def __init__(self):
super(WeightedCrossEntropyLoss, self).__init__()
def forward(self, input: torch.Tensor, target: torch.Tensor, weights: torch.Tensor):
"""
Args:
input: (B, #anchors, #classes) float tensor.
Predited logits for each class.
target: (B, #anchors, #classes) float tensor.
One-hot classification targets.
weights: (B, #anchors) float tensor.
Anchor-wise weights.
Returns:
loss: (B, #anchors) float tensor.
Weighted cross entropy loss without reduction
"""
input = input.permute(0, 2, 1) # (batch_size, 7, 321408)
target = target.argmax(dim=-1) # (batch_size, 321408)
# cross_entropy = log_softmax + nll_loss
# 先對input進行softmax,然后取log,最后將y與經過log_softmax()函式激活后的資料,兩者相乘,再求平均值,最后取反
# 計算交叉熵損失并乘權重 (batch_size, 321408)
loss = F.cross_entropy(input, target, reduction='none') * weights
return loss
六、PointPillars使用的資料增強
一:資料增強
在PointPillars中使用了和SECOND網路中相似的資料增強手段:
1、包括建立類別GT和索引,在點云中隨機放置15個、0個、8個的車輛、行人、自行車樣本到任意點云幀中,
2、每一幀中的所有GT_box都會被隨機旋轉([ -pi/20 , pi/20);同時在x、y、z軸平移上隨機平移,x、y、z取值來自期望為0,方差為0.25的正態分布,
3、全部點云沿x軸翻轉、全域點云旋轉和隨機縮放操作;使用全域x、y、z軸平移來模擬定位噪聲,x、y、z取值來自期望為0,方差為0.2的正態分布,
七、PointPillars測驗結果和消融實驗
一、測驗結果
PointPillars論文KITTI資料集測驗結果

PointPillars在OpenPCDet中KITTI資料集測驗結果(結果僅顯示在kitti驗證集moderate精度)

二、消融實驗
1、空間解析度
在實作中,每個pillar的長寬都設定在0.16m;如果增大這個資料的話,可以加快的推理速度,因為更大的pillars會使得整個點云中的非空pillar更少,同時計算得到的偽圖象長寬也會更小,加快了pointnet 編碼器和網路中CNN提取特征的速度;但是,更小的pillars可以使網路學習到更加細膩的特征,擁有更好的定位精度,測驗結果如下:
即更大的pillar帶來了更快的速度,更小的pillar擁有更高的精度,
2、每個box獨立進行資料增強
雖然在VxoelNet和SECOND中都推薦大量的對每個GT_Box進行資料增強;但是在PointPillars中通過實驗發現,這樣的操作會使得對行人的檢測性能大幅度的降低,反而較少的獨立資料增強效果更好,可能的原因是在每一幀點云中放入從樣本庫中真實采樣中的GT資料減輕了對大幅度進行獨立GT增強的需要,
3、點云表達特征增強
在對每個點云x、y、z、r資料進行增強的時候,PointPillars采用了和VoxelNet一樣的操作,都為每個點云空間的特征加入了當前點云到當前pilar底部中心的距離,xp和yp,這一操作使得最終的整體檢測性能提高了0.5map,同時也使得論文中結果更具復現性,
4、編碼器
一個可以學習的編碼器對于固定的編碼器來說是實作網路端到端訓練的重要架構,此處對PointPillars中使用不同編碼器得到的結果進行了實驗,結果如下:
下一篇文章將會詳細介紹PointPillars在OpenPCDet中的推理代碼實作!!!
參考文章或文獻:
1、https://github.com/open-mmlab/OpenPCDet/
2、https://github.com/jjw-DL/OpenPCDet-Noted/
3、使用KITTI資料集實作坐標轉換 - 知乎
4、【3D目標檢測】PointPillars論文和代碼決議 - 知乎
5、【3D目標檢測】SECOND演算法決議 - 知乎
6、 https://arxiv.org/abs/1812.05784
7、Sensors | Free Full-Text | SECOND: Sparsely Embedded Convolutional Detection
8、https://arxiv.org/abs/1711.06396
9、https://arxiv.org/abs/1612.00593
10、【3D計算機視覺】從PointNet到PointNet++理論及pytorch代碼_小執著的博客-CSDN博客_pointnet1
11、【3D計算機視覺】PointNet++的pytorch實作代碼閱讀_小執著的博客-CSDN博客_pointnet++ pytorch
12、The KITTI Vision Benchmark Suite
13、KITTI資料集--引數_cuichuanchen3307的博客-CSDN博客_kitti
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/429777.html
標籤:AI

