我正在使用網路攝像頭構建一個應用程式來控制視頻游戲(有點像 kinect)。它使用網路攝像頭 (cv2.VideoCapture(0))、AI 姿勢估計 ( 
我們使用哪些導致延遲的步驟?
暫時原諒我們累積每個特定延遲相關成本的原始草圖:
CAM \____/ python code GIL-awaiting ~ 100 [ms] chopping
|::| python code calling a cv2.<function>()
|::| __________________________________________-----!!!!!!!-----------
|::| ^ 2x NNNNN!!!!!!!MOVES DATA!
|::| | per-call NNNNN!!!!!!! 1.THERE
|::| | COST NNNNN!!!!!!! 2.BACK
|::| | TTTT-openCV::MAT into python numpy.array
|::| | //// forMAT TRANSFORMER TRANSFORMATIONS
USBx | //// TRANSFORMATIONS
|::| | //// TRANSFORMATIONS
|::| | //// TRANSFORMATIONS
|::| | //// TRANSFORMATIONS
|::| | //// TRANSFORMATIONS
H/W oooo _v____TTTT in-RAM openCV::MAT storage TRANSFORMATIONS
/ \ oooo ------ openCV::MAT object-mapper
\ / xxxx
O/S--- °°°° xxxx
driver """" _____ xxxx
\\\\ ^ xxxx ...... openCV {signed|unsigned}-{size}-{N-channels}
_________\\\\___|___ __________________________________________
openCV I/O ^ PPPP PROCESSING
as F | .... PROCESSING
A | ... PROCESSING
S | .. PROCESSING
T | . PROCESSING
as | PPPP PROCESSING
possible___v___PPPP _____ openCV::MAT NATIVE-object PROCESSING
我們/我們可以(在這里)對抗什么延遲?
硬體延遲可能會有所幫助,但更改已購買的硬體可能會變得昂貴
已經優化了延遲的工具箱的軟體延遲是可能的,但越來越難
設計效率低下是最后也是最常見的地方,延遲可能會被削減
開放式簡歷?
這里沒什么可做的。問題在于 OpenCV-Python 系結細節:
... So when you call a function, say
res = equalizeHist(img1,img2)in Python, you pass two numpy arrays and you expect anothernumpyarray as the output. So these numpy arrays are converted tocv::Matand then calls theequalizeHist()function in C . Final result, res will be converted back into a Numpy array. So in short, almost all operations are done in C which gives us almost same speed as that of C .
This works fine "outside" a control-loop, not in our case, where both of the two transport-costs, transformation-costs and any of new or interim-data storage RAM-allocation-costs result in worsening our control-loop TAT.
So avoid any and all calls of OpenCV-native functions from Python-(behind the bindings' latency extra-miles)-side, no matter how tempting or sweet these may look on the first sight.
HUNDREDS-of-[ms] are a rather bitter cost of ignoring this advice.
Python ?
Yes, Python. Using Python interpreter introduces both latency per se, plus adds problems with concurrency-avoided processing, no matter how many cores does our hardware operate on ( while recent Py3 tries a lot to lower these costs under the interpreter-level software).
We can test & squeeze max out of the (still unavoidable, in 2022) GIL-lock interleaving - check the sys.getswitchinterval() and test increasing this amount for having less interleaved python-side processing ( tweaking is dependent on other your python-application ambitions ( GUI, distributed-computing tasks, python network-I/O workloads, python-HW-I/O-s, if applicable, etc )
RAM-memory-I/O costs ?
Our next major enemy. Using a least-sufficient-enough image-DATA-format, that MediaPipe can work with is the way forward in this segment.
Avoidable losses
All other (our) sins belong to this segment. Avoid any image-DATA-format transformations ( see above, cost may easily grow into HUNDREDS THOUSANDS of [us] just for converting an already acquired-&-formatted-&-stored numpy.array into just another colourmap)
MediaPipe
lists enumerated formats it can work with:
// ImageFormat
SRGB: sRGB, interleaved: one byte for R,
then one byte for G,
then one byte for B for each pixel.
SRGBA: sRGBA, interleaved: one byte for R,
one byte for G,
one byte for B,
one byte for alpha or unused.
SBGRA: sBGRA, interleaved: one byte for B,
one byte for G,
one byte for R,
one byte for alpha or unused.
GRAY8: Grayscale, one byte per pixel.
GRAY16: Grayscale, one uint16 per pixel.
SRGB48: sRGB,interleaved, each component is a uint16.
SRGBA64: sRGBA,interleaved,each component is a uint16.
VEC32F1: One float per pixel.
VEC32F2: Two floats per pixel.
So, choose the MVF -- the minimum viable format -- for gesture-recognition to work and downscale the amount of pixels as possible ( 400x600-GRAY8 would be my hot candidate )
預配置(不遺漏cv.CAP_PROP_FOURCC 細節)本機端OpenCV::VideoCapture處理,僅將這個 MVF 以 RAW 格式簡單存盤在 Acquisition-&-Pre-processing 鏈的本機端,這樣就沒有其他帖子了-進行格式化。
如果確實被迫接觸 python 端numpy.array物件,則更喜歡使用矢量化和跨步技巧驅動的操作而不是.view()-s 或.data-buffers,以避免任何不必要的附加延遲成本增加 control-loop TAT。
選項?
通過精確配置本機端 OpenCV 處理以匹配所需的 MediaPipe 資料格式,消除任何 python 端呼叫(因為這些呼叫會花費您 --2 倍的資料 I/O 成本 轉換成本)
最小化,最好避免任何阻塞,如果控制回路仍然過于偏斜,請嘗試使用 分布式處理將原始資料移動到 localhost 或 sub-ms LAN 域中的其他行程(不一定是 Python 解釋器)(此處提供更多提示)
嘗試適應熱資料 RAM 占用空間以匹配您的 CPU-Cache Hierarchy 快取行的大小和關聯性詳細資訊(請參閱此)
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/408488.html
標籤:
