1、檢測代碼
代碼源自datawhale官方提供baseline: https://github.com/datawhalechina/team-learning-cv/tree/master/DefectDetection
資料處理代碼都寫好了,太感動了 _
baseline使用的是yolov5,我的顯卡只有一個1080Ti,所以先選擇yolov5s進行訓練,設定訓練50個epoch, 圖片大小設定為512x512,
這部分內容主要參考了https://blog.csdn.net/qq_26751117/article/details/113853150
- 資料處理:主要是將比賽方提供的資料格式轉化為yolo需要的格式,先使用convertTrainLabel.py轉化,然后在運行process_data_yolo.py,就得到了資料,存放位置為process_data檔案夾;注意需要修改process_data_yolo中val欄位,全部改為train欄位,運行兩次,分別得到驗證和訓練的資料檔案,
- 預訓練權重:嘗試了一下不加載預訓練權重,效果不是很好,可能是因為本來資料就比較少,還是需要進行遷移學習的,所以想辦法下載了yolov5s.pt檔案,進行了加載,由于模型比較小,可以設定較大的batch size, 這里是16,這里借上邊那個文章的圖,需要簡單修改一下加載權重的部分,

- 運行,簡單修改了一下一些報錯的點,然后就可以運行了yolov5s模型了,
2、docker提交
我的dockerfile檔案:
# Base Images
FROM registry.cn-shanghai.aliyuncs.com/tcc-public/pytorch:1.4-cuda10.1-py3
ADD . /workspace
WORKDIR /workspace
RUN pip install -r requirements.txt
CMD ["sh", "run.sh"]
開始構建:
(torch16) pdluser@pdluser-System-Product-Name:~/project/tianchi_demo$ sudo docker build -t registry.cn-shenzhen.aliyuncs.com/nine_percent/tianchi_submit:1.0 .
[sudo] pdluser 的密碼:
Sending build context to Docker daemon 6.778GB
Step 1/5 : FROM registry.cn-shanghai.aliyuncs.com/tcc-public/pytorch:1.4-cuda10.1-py3
---> 76c152fbfd03
Step 2/5 : ADD . /workspace
---> 10ca596f6d20
Step 3/5 : WORKDIR /workspace
---> Running in 37a88d04d2a9
Removing intermediate container 37a88d04d2a9
---> 7f7982fbfaba
Step 4/5 : RUN pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple --ignore-installed PyYAML
---> Running in 877004f83473
Looking in indexes: https://mirrors.aliyun.com/pypi/simple
Downloading https://mirrors.aliyun.com/pypi/packages/ec/d6/a82d191ec058314b2b7cbee5635150f754ba1c6ffc05387bc9a57efe48b8/cryptacular-1.5.5.tar.gz
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Preparing wheel metadata: started
Preparing wheel metadata: finished with status 'done'
Collecting zope.sqlalchemy
Downloading https://mirrors.aliyun.com/pypi/packages/fa/83/459decec1dd2c14d60f9a360fff989c128abe545a1554a1da64b054a55d4/zope.sqlalchemy-1.3-py2.py3-none-any.whl
Collecting velruse>=1.0.3
Downloading https://mirrors.aliyun.com/pypi/packages/8f/0b/d47ea894587f3155f8c4520aa74d57c856189d0bbe27e831881d655a3386/PasteDeploy-2.1.1-py2.py3-none-any.whl
Building wheels for collected packages: cryptacular
Building wheel for cryptacular (PEP 517): started
Building wheel for cryptacular (PEP 517): finished with status 'done'
Created wheel for cryptacular: filename=cryptacular-1.5.5-cp37-abi3-manylinux2010_x86_64.whl size=52452 sha256=93037b68313c3d86df4c8cab9d0cc0866d1579cb7399410c7903b56eb2ff0067
Stored in directory: /root/.cache/pip/wheels/dd/c7/11/721f100da8477396b1f8fcfa2d23c801d5bac07d0e2d82dc0d
Successfully built cryptacular
Building wheels for collected packages: apex, velruse, pbkdf2, anykeystore
Building wheel for apex (setup.py): started
Building wheel for apex (setup.py): finished with status 'done'
Created wheel for apex: filename=apex-0.9.10.dev0-cp37-none-any.whl size=46468 sha256=c68745de219dd6169195cfec426e528cd5f5f932bd3cb7ddbc22817a9827cfea
Stored in directory: /root/.cache/pip/wheels/b8/f0/7a/2fc4cf8a70bfc0981f7009a2146685d06ee220398c0b780acf
Building wheel for velruse (setup.py): started
Building wheel for velruse (setup.py): finished with status 'done'
Created wheel for velruse: filename=velruse-1.1.1-cp37-none-any.whl size=50923 sha256=c300b70b745467b6b075bec09d6b2a11ab3524f6de31605431a62308613648e3
Stored in directory:
Successfully built apex velruse pbkdf2 anykeystore
Installing collected packages: PyYAML, Cython, numpy, opencv-python, typing-extensions, torch, pyparsing, kiwisolver, six, cycler, pillow
Removing intermediate container 877004f83473
---> 5c40d92c4bc1
Step 5/5 : CMD ["sh", "run.sh"]
---> Running in 41c2daf77fbc
Removing intermediate container 41c2daf77fbc
---> 603e3fe4452c
Successfully built 603e3fe4452c
Successfully tagged registry.cn-shenzhen.aliyuncs.com/nine_percent/tianchi_submit:1.0
在構建完鏡像以后,進入鏡像:
先查看一下對應的ID:
pdluser@pdluser-System-Product-Name:~$ sudo docker images
[sudo] pdluser 的密碼:
REPOSITORY TAG IMAGE ID CREATED SIZE
registry.cn-shenzhen.aliyuncs.com/nine_percent/tianchi_submit 1.0 b773b4e52e7a 4 minutes ago 11.2GB
<none> <none> f99df53cc33c 23 hours ago 7.92GB
registry.cn-shanghai.aliyuncs.com/tcc-public/pytorch 1.4-cuda10.1-py3 76c152fbfd03 13 months ago 7.56GB
registry.cn-shanghai.aliyuncs.com/tcc-public/python 3 a4cc999cf2aa 21 months ago 929MB
進入第一個鏡像,b7:
(torch16) pdluser@pdluser-System-Product-Name:~/project/tianchi_demo$ sudo docker run -it b7 /bin/bash
root@2a128d20af63:/workspace#
在這里運行run.sh,測驗成功就可以提交了,
下一步將鏡像推送到Registry:
$ sudo docker login --username=用戶名 registry.cn-shenzhen.aliyuncs.com
$ sudo docker tag [ImageId] registry.cn-shenzhen.aliyuncs.com/nine_percent/tianchi_submit:[鏡像版本號]
$ sudo docker push registry.cn-shenzhen.aliyuncs.com/nine_percent/tianchi_submit:[鏡像版本號]

3、遇到的問題
在進行build的時候,發現以下問題,ERROR: Double requirement given: PyYAML>=5.3 (from -r requirements.txt (line 10)) (already in PyYAML, name=‘PyYAML’)
通過把yaml的等級要求去掉,就不會報錯了,

用到opencv的時候也出現了報錯:
Python 3.7.4 (default, Aug 13 2019, 20:35:49)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/conda/lib/python3.7/site-packages/cv2/__init__.py", line 5, in <module>
from .cv2 import *
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
解決方案是在dockerfile中添加以下內容:
RUN apt update
RUN apt install libgl1-mesa-glx
RUN apt-get install -y libglib2.0-0
但是會遇到以下問題:

這樣改動dockfile,避免互動:
RUN DEBIAN_FRONTEND=noninteractive apt update -y
RUN DEBIAN_FRONTEND=noninteractive apt install libgl1-mesa-glx -y
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y libglib2.0-0
第一次提交出錯:

啊,連續兩次錯誤了,

之后調節了一下檔案存放位置,和對應的命令,終于提交成功了,可喜可賀可喜可賀,
吐槽:這個docker雖然很不錯,但是入門還是有一定門檻的,我總結了一下使用程序中經常用到的知識點:https://blog.csdn.net/DD_PP_JJ/article/details/113902874 可以參考一下, 整個程序docker卡殼時間比較久,查看了一下群友推薦的資料,感徑訓是對docker了解比較局限,每次build的時候都需要從遠端下載鏡像,非常麻煩,每次build要花很久很久,遇到了許多dockerfile的相關的問題,每次處理都需要重新下載,感覺很麻煩,提交比賽結果也是非常漫長,push要花很久時間,好麻煩,另外,不知道為何我再運行docker -v的時候發現映射不到容器中,這個問題還沒有解決,
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/262142.html
標籤:AI
