yolox_nano模型的引數量極小,并且有著一定的準確度而被使用到手機端的yolox部署:

yolox_nano訓練命令:
python tools/train.py -f exps/example/yolox_voc/yolox_voc_nano.py -d 1 -b 8 -c weights/yolox_nano.pth
yolox_nano.pth是官網下載的coco資料集的權重檔案來做完這個預訓練權重(我發現使用之前訓練過的yolox_nano.pth其實也是可以的,這個不是主要問題,重要的是yolox_voc_nano.py這個檔案需要好好改,如果改的有問題,就會之后轉onnx檔案的時候有錯誤:Missing key(s) & Unexpected key(s) in state_dict…),
這里需要注意:
-f exps/example/yolox_voc/yolox_voc_nano.py
中的
yolox_voc_nano.py
代碼如下:
(在.\exps\default\nano.py和.\exps/example/yolox_voc/yolox_voc.py基礎上修改,也就是把nano.py的init函式和get_model復制粘貼到yolox_voc.py里,修改修改類別數 self.num_classes 還有image_sets,最后把名字變成yolox_voc_nano.py檔案就行)
# encoding: utf-8
import os
import torch
# 需要加上這個
import torch.nn as nn
import torch.distributed as dist
from yolox.data import get_yolox_datadir
from yolox.exp import Exp as MyExp
class Exp(MyExp):
def __init__(self):
super(Exp, self).__init__()
# 修改網路深度和寬度
self.depth = 0.33
self.width = 0.25
self.input_size = (416, 416)
self.mosaic_scale = (0.5, 1.5)
self.random_size = (10, 20)
self.test_size = (416, 416)
self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]
self.enable_mixup = False
# 修改類別數
self.num_classes = 1
# 之前沒有加上這個get_model函式,就訓練有問題
def get_model(self, sublinear=False):
def init_yolo(M):
for m in M.modules():
if isinstance(m, nn.BatchNorm2d):
m.eps = 1e-3
m.momentum = 0.03
if "model" not in self.__dict__:
from yolox.models import YOLOX, YOLOPAFPN, YOLOXHead
in_channels = [256, 512, 1024]
# NANO model use depthwise = True, which is main difference.
backbone = YOLOPAFPN(self.depth, self.width, in_channels=in_channels, depthwise=True)
head = YOLOXHead(self.num_classes, self.width, in_channels=in_channels, depthwise=True)
self.model = YOLOX(backbone, head)
self.model.apply(init_yolo)
self.model.head.initialize_biases(1e-2)
return self.model
def get_data_loader(self, batch_size, is_distributed, no_aug=False, cache_img=False):
from yolox.data import (
VOCDetection,
TrainTransform,
YoloBatchSampler,
DataLoader,
InfiniteSampler,
MosaicDetection,
worker_init_reset_seed,
)
from yolox.utils import (
wait_for_the_master,
get_local_rank,
)
local_rank = get_local_rank()
with wait_for_the_master(local_rank):
dataset = VOCDetection(
data_dir=os.path.join(get_yolox_datadir(), "VOCdevkit"),
# image_sets=[('2007', 'trainval'), ('2012', 'trainval')],
# 訓練的時候只有VOC2007的資料集,所以需要改這里
image_sets=[('2007', 'trainval')],
img_size=self.input_size,
preproc=TrainTransform(
max_labels=50,
flip_prob=self.flip_prob,
hsv_prob=self.hsv_prob),
cache=cache_img,
)
dataset = MosaicDetection(
dataset,
mosaic=not no_aug,
img_size=self.input_size,
preproc=TrainTransform(
max_labels=120,
flip_prob=self.flip_prob,
hsv_prob=self.hsv_prob),
degrees=self.degrees,
translate=self.translate,
mosaic_scale=self.mosaic_scale,
mixup_scale=self.mixup_scale,
shear=self.shear,
enable_mixup=self.enable_mixup,
mosaic_prob=self.mosaic_prob,
mixup_prob=self.mixup_prob,
)
self.dataset = dataset
if is_distributed:
batch_size = batch_size // dist.get_world_size()
sampler = InfiniteSampler(
len(self.dataset), seed=self.seed if self.seed else 0
)
batch_sampler = YoloBatchSampler(
sampler=sampler,
batch_size=batch_size,
drop_last=False,
mosaic=not no_aug,
)
dataloader_kwargs = {"num_workers": self.data_num_workers, "pin_memory": True}
dataloader_kwargs["batch_sampler"] = batch_sampler
# Make sure each process has different random seed, especially for 'fork' method
dataloader_kwargs["worker_init_fn"] = worker_init_reset_seed
train_loader = DataLoader(self.dataset, **dataloader_kwargs)
return train_loader
def get_eval_loader(self, batch_size, is_distributed, testdev=False, legacy=False):
from yolox.data import VOCDetection, ValTransform
valdataset = VOCDetection(
data_dir=os.path.join(get_yolox_datadir(), "VOCdevkit"),
image_sets=[('2007', 'test')],
img_size=self.test_size,
preproc=ValTransform(legacy=legacy),
)
if is_distributed:
batch_size = batch_size // dist.get_world_size()
sampler = torch.utils.data.distributed.DistributedSampler(
valdataset, shuffle=False
)
else:
sampler = torch.utils.data.SequentialSampler(valdataset)
dataloader_kwargs = {
"num_workers": self.data_num_workers,
"pin_memory": True,
"sampler": sampler,
}
dataloader_kwargs["batch_size"] = batch_size
val_loader = torch.utils.data.DataLoader(valdataset, **dataloader_kwargs)
return val_loader
def get_evaluator(self, batch_size, is_distributed, testdev=False, legacy=False):
from yolox.evaluators import VOCEvaluator
val_loader = self.get_eval_loader(batch_size, is_distributed, testdev, legacy)
evaluator = VOCEvaluator(
dataloader=val_loader,
img_size=self.test_size,
confthre=self.test_conf,
nmsthre=self.nmsthre,
num_classes=self.num_classes,
)
return evaluator
修改過后,就可以開始訓練yolox_nano了:
yolox_nano訓練命令:
python tools/train.py -f exps/example/yolox_voc/yolox_voc_nano.py -d 1 -b 8 -c weights/yolox_nano.pth

這里我沒有使用混合精度訓練,原因如下博客所示:
https://blog.csdn.net/ELSA001/article/details/120918082?spm=1001.2014.3001.5502
訓練結束之后測驗一下:
python tools/demo.py image -f exps/example/yolox_voc/yolox_voc_nano.py -c YOLOX_outputs/yolox_voc_nano/best_ckpt.pth --path testfiles/img1.jpg --conf 0.3 --nms 0.65 --tsize 640 --save_result --device cpu
效果如下:


接下來生成onnx檔案(YOLOX路徑下執行):
python tools/export_onnx.py -n yolox-nano -c weights/yolox_nano.pth --output-name weights/yolox_nano.onnx
yolox_nano.pth檔案是訓練好的best_ckpt.pth改了名字得來的
效果如下:

(torch_G) E:\YOLOX>python tools/export_onnx.py -n yolox-nano -c weights/yolox_nano.pth --output-name weights/yolox_nano.onnx
2021-10-28 21:45:50.843 | INFO | __main__:main:59 - args value: Namespace(batch_size=1, ckpt='weights/yolox_nano.pth', dynamic=False, exp_file=None, experiment_name=None, input='images', name='yolox-nano', no_onnxsim=False, opset=11, opts=[], output='output', output_name='weights/yolox_nano.onnx')
2021-10-28 21:45:51.001 | INFO | __main__:main:83 - loading checkpoint done.
2021-10-28 21:45:56.211 | INFO | __main__:main:96 - generated onnx model named weights/yolox_nano.onnx
2021-10-28 21:45:56.773 | INFO | __main__:main:112 - generated simplified onnx model named weights/yolox_nano.onnx
loading checkpoint done這個程序可能會比較久,
接下來就是匯出ncnn檔案:
拷貝yolox_nano.onnx檔案到E:\ncnn\build-vs2019\tools\onnx檔案下
同時也使用命令列到E:\ncnn\build-vs2019\tools\onnx目錄下:

使用命令生成ncnn相應的param和bin檔案:
onnx2ncnn.exe yolox_nano.onnx yolox_nano.param yolox_nano.bin

這里輸出了很多Unsupported slice step !也沒關系,后面可以改的,主要是因為ncnn不支持Focus模塊,會有警告,
修改yolox_nano.param檔案:
把 295修改為295 - 9 = 286 (由于我們將洗掉 10 層并添加 1 層,因此總層數應減去 9),
然后從 Split 到 Concat 洗掉 10 行代碼,但記住Concat一行最后倒數第二個數字:683,
在輸入后添加 YoloV5Focus 層(使用之前的數字 683):
YoloV5Focus focus 1 1 images 683
這里需要注意,里面的空格需要一個個慢慢敲的,可以使用Netron軟體來看看:

修改之后是這樣的:
7767517
286 328
Input images 0 1 images
YoloV5Focus focus 1 1 images 683
Convolution Conv_41 1 1 683 1177 0=16 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=1728
....

yolox_nano.param修改之后使用Netron軟體來看的效果如下:

這樣就算是成功修改了!
使用ncnn_optimize優化ncnn檔案:
產生新的param和bin檔案:
在E:\ncnn\build-vs2019\tools路徑下執行(先拷貝yolox_nano.bin和yolox_nano.param檔案到此路徑下)
ncnnoptimize.exe yolox_nano.param yolox_nano.bin yolox_nano.param yolox_nano.bin 65536
(torch_G) E:\ncnn\build-vs2019\tools\onnx>cd ..
(torch_G) E:\ncnn\build-vs2019\tools>ncnnoptimize.exe yolox_nano.param yolox_nano.bin yolox_nano.param yolox_nano.bin 65536
create_custom_layer YoloV5Focus
fuse_convolution_activation Conv_314 Sigmoid_330
fuse_convolution_activation Conv_328 Sigmoid_329
fuse_convolution_activation Conv_347 Sigmoid_363
fuse_convolution_activation Conv_361 Sigmoid_362
fuse_convolution_activation Conv_380 Sigmoid_396
fuse_convolution_activation Conv_394 Sigmoid_395
model has custom layer, shape_inference skipped
model has custom layer, estimate_memory_footprint skipped

這樣就算是把ncnn檔案優化完成了,
之后在Android studio上面使用build.gradle來自動編譯yolox的安卓檔案,
編譯結束之后,把剛剛優化之后的yolox_nano.param和yolox_nano.bin放在這個assets目錄下:
E:\AndroidStudioProjects\ncnn-android-yolox\app\src\main\assets

最后,我們只需要修改cpp目錄下的yolox.cpp和yoloxncnn.cpp的class_names陣列就行:


點擊運行即可檢測:


但是這個檢測還是有問題的,就是我沒辦法識別高解析度的圖片和視頻,只能識別我喂入神經網路來訓練的圖片的低解析度的圖片,不過最侄訓是檢測出來了,還是很開心的,
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/341991.html
標籤:其他
