使用numpy優化橫縱鄰接的計算-有解無憂

我有以下單元格：

cells = np.array([[1, 1, 1],
                  [1, 1, 0],
                  [1, 0, 0],
                  [1, 0, 1],
                  [1, 0, 0],
                  [1, 1, 1]])

我想計算水平和垂直鄰接來得出這個結果：

# horizontal adjacency 
array([[3, 2, 1],
       [2, 1, 0],
       [1, 0, 0],
       [1, 0, 1],
       [1, 0, 0],
       [3, 2, 1]])

# vertical adjacency 
array([[6, 2, 1],
       [5, 1, 0],
       [4, 0, 0],
       [3, 0, 1],
       [2, 0, 0],
       [1, 1, 1]])

實際的解決方案如下所示：

def get_horizontal_adjacency(cells):
    adjacency_horizontal = np.zeros(cells.shape, dtype=int)
    for y in range(cells.shape[0]):
        span = 0
        for x in reversed(range(cells.shape[1])):
            if cells[y, x] > 0:
                span  = 1
            else:
                span = 0
            adjacency_horizontal[y, x] = span
    return adjacency_horizontal

def get_vertical_adjacency(cells):
    adjacency_vertical = np.zeros(cells.shape, dtype=int)
    for x in range(cells.shape[1]):
        span = 0
        for y in reversed(range(cells.shape[0])):
            if cells[y, x] > 0:
                span  = 1
            else:
                span = 0
            adjacency_vertical[y, x] = span
    return adjacency_vertical

演算法基本上是（對于水平鄰接）：

回圈通過行
通過列向后回圈
如果單元格的 x, y 值不為零，則在實際跨度上加 1
如果細胞的X，Y值是零，實際跨距復位到零
將跨度設定為結果陣列的新 x, y 值

由于我需要在所有陣列元素上回圈兩次，這對于較大的陣列（例如影像）來說很慢。

有沒有辦法使用矢量化或其他一些 numpy 魔法來改進演算法？

uj5u.com熱心網友回復：

正如評論中已經指出的那樣，這是一個完美的例子，通過 Cython 或 Numba 更容易重寫函式。既然 Mark 已經提供了 Numba 的解決方案，那么讓我提供一個 Cython 的解決方案。首先，讓我們在我的機器上對他的解決方案進行計時以進行公平比較：

In [5]: %timeit nb_get_horizontal_adjacency(im, result)
836 μs ± 36 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

假設影像im是np.ndarraywith dtype=np.uint8，并行的 Cython 解決方案如下所示：

In [6]: %%cython -f -a -c=-O3 -c=-march=native -c=-fopenmp --link-args=-fopenmp

from cython import boundscheck, wraparound, initializedcheck
from libc.stdint cimport uint8_t, uint32_t
from cython.parallel cimport prange
import numpy as np

@boundscheck(False)
@wraparound(False)
@initializedcheck(False)
def cy_get_horizontal_adjacency(uint8_t[:, ::1] cells):
    cdef int nrows = cells.shape[0]
    cdef int ncols = cells.shape[1]
    cdef uint32_t[:, ::1] adjacency_horizontal = np.zeros((nrows, ncols), dtype=np.uint32)
    cdef int x, y, span
    for y in prange(nrows, nogil=True, schedule="static"):
        span = 0
        for x in reversed(range(ncols)):
            if cells[y, x] > 0:
                span  = 1
            else:
                span = 0
            adjacency_horizontal[y, x] = span
    return np.array(adjacency_horizontal, copy=False)

在我的機器上，這幾乎快了兩倍：

In [7]: %timeit cy_get_horizontal_adjacency(im)
431 μs ± 4.38 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

uj5u.com熱心網友回復：

我用 Numba 進行了非常快速的嘗試，但沒有徹底檢查它，盡管結果似乎是正確的：

#!/usr/bin/env python3

# https://stackoverflow.com/q/69854335/2836621
# magick -size 1920x1080 xc:black -fill white -draw "circle 960,540 960,1040" -fill black -draw "circle 960,540 960,800" a.png

import cv2
import numpy as np
import numba as nb

def get_horizontal_adjacency(cells):
    adjacency_horizontal = np.zeros(cells.shape, dtype=int)
    for y in range(cells.shape[0]):
        span = 0
        for x in reversed(range(cells.shape[1])):
            if cells[y, x] > 0:
                span  = 1
            else:
                span = 0
            adjacency_horizontal[y, x] = span
    return adjacency_horizontal

@nb.jit('void(uint8[:,::1], int32[:,::1])',parallel=True)
def nb_get_horizontal_adjacency(cells, result):
    for y in nb.prange(cells.shape[0]):
        span = 0
        for x in range(cells.shape[1]-1,0,-1):
            if cells[y, x] > 0:
                span  = 1
            else:
                span = 0
            result[y, x] = span
    return 

# Load image
im = cv2.imread('a.png', cv2.IMREAD_GRAYSCALE)

%timeit get_horizontal_adjacency(im)

result = np.zeros((im.shape[0],im.shape[1]),dtype=np.int32)
%timeit nb_get_horizontal_adjacency(im, result)

如果運行正常，時間安排很好，顯示了 4000 倍的加速：

In [15]: %timeit nb_get_horizontal_adjacency(im, result)
695 μs ± 9.12 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [17]: %timeit get_horizontal_adjacency(im)
2.78 s ± 44.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

輸入

輸入影像以 1080p 尺寸創建，即 1920x1080，ImageMagick使用：

magick -size 1920x1080 xc:black -fill white -draw "circle 960,540 960,1040" -fill black -draw "circle 960,540 960,800" a.png

使用numpy優化橫縱鄰接的計算

輸出（對比度調整）

使用numpy優化橫縱鄰接的計算

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/351987.html

標籤：Python 麻木表现优化矢量化

上一篇：應該在最小二乘法之前計算QR分解以加快程序嗎？

下一篇：在x86-64上，32位應用程式是否比64位應用程式具有性能優勢？