numpy向量化演算法以回傳二進制序列中連續部分的索引-有解無憂

給定一個二進制序列，比如：000110111100011，我希望標記所有連續 1 的部分。

000110111100011
0123456789abcdef

我將使用十六進制表示法，因為 16 是示例序列的合適長度。

這應該回傳 (3 5) (6 a) (df) 第一個區域占據位置 3 和 4，因此（使用 Python 的約定）(3,5)表示區域。

但是偶爾會損壞一點，所以我希望添加一個容差，以便忽略k連續的 0。

所以，min_gap=2在上面的例子中，我想得到：(3 a) (d f).

即被5) (6洗掉，因為該間隙僅包含一個零，并且被“修補”。

看起來很簡單，但我努力在 numpy 中完全矢量化一個解決方案。

def get_segments(segment_of_sums, mingap=1):
    nonempty_padded = np.concatenate(( [0], segment_of_sums, [0] ))
    b = nonempty_padded > 0
    edges = b[:-1] ^ b[1:]
    indices = np.argwhere(edges)[:,0]
    # indices[0] will always be the first on-ramp, and indices[-1] will be the last off-ramp
    # if segment ends with a 1, indices[-1] will be len(segment)
    # len(indices) will ALWAYS be an even number
 
    if mingap>1 and len(indices) > 2:
        gap_lengths = indices[2::2] - indices[1::2][:-1]
        gap_keeps = gap_lengths >= mingap

        index_keeps = np.zeros_like(indices, dtype=np.bool8)
        index_keeps[[0, -1]] = True
        for i, flag in enumerate(gap_keeps):
            if flag:
                index_keeps[[2*i 1, 2*i 2]] = True
        
        indices = indices[np.argwhere(index_keeps)[:,0]]

    return indices.reshape((-1,2))

TEST = True
if TEST:
    for L in [
        [0,1,0, 1,1,1, 0,0,1, 1,1,0],
        [0,1,0, 1,1,1, 0,0,1, 1,1,1],
        [1,1,0, 1,1,1, 0,0,1, 1,1,1]
    ]:
        A = np.array(L, dtype=np.bool8)
        print(
            get_segments(A, mingap=2)
        )

此代碼正確列印：

[[ 1  6]
 [ 8 11]]
[[ 1  6]
 [ 8 12]]
[[ 0  6]
 [ 8 12]]

但是for回圈感覺很笨拙。

誰能看到更清潔的技術？

uj5u.com熱心網友回復：

不確定您是否可以輕松地將其矢量化，但這里有一個使用itertools.groupby它的解決方案應該非常快：

s = '000110111100011'

from itertools import groupby

[('%x' % (x:=list(g))[0][0], '%x' % (x[-1][0] 1))
 for k,g in groupby(enumerate(s), lambda x: x[1])
 if k == '1']

輸出：

[('3', '5'), ('6', 'a'), ('d', 'f')]

uj5u.com熱心網友回復：

如果您考慮使用另一個包，它pandas提供了良好的矢量化前向填充和 groupby 運算子，這是您的用例的理想選擇。另外，numpy 的開銷很小：

import pandas as pd

d=pd.DataFrame({'a':list('0001101111000110'),  # extra `0` at end
                'b':list('0123456789abcdef')}
              )

thresh = 2
s = d['a'].eq('1')

s = s.where(s).ffill(limit=thresh-1)           # here's where forward fills comes into play

(d.where(s.notna()).groupby(s.isna().cumsum())
  .agg(min_idx=('b','first'), max_idx=('b','last'))
  .dropna()
)

你會得到這樣的東西：

  min_idx max_idx
a                
3       3       a
5       d       f

uj5u.com熱心網友回復：

這可以轉換為查找相對最大值（即連續 1 的組），因此我們可以使用scipy.find_peaks如下：

import numpy as np
from scipy.signal import find_peaks

s = "0123456789abcdef"
data = np.array([0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1])

# concatenate 0 at both ends to find peaks at the end of the arrays
data = np.concatenate(([0], data, [0]))

# find the bases the start of each plateau will be included in bases 
_, bases = find_peaks(data, width=1)

left, right = bases["left_bases"], bases["right_bases"]
result = [(s[l], s[r - 1]) for l, r in zip(bases["left_bases"], bases["right_bases"])]
print(result)

輸出

[('3', '5'), ('6', 'a'), ('d', 'f')]

要計算間隙，只需執行以下操作：

gaps = left[1:] - (right[:-1] - 1)
print(gaps)

輸出

[1 3]

請注意，第二次和第一次運行之間的差距為 1，第三次和第二次運行之間的差距為 3。

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/322021.html

標籤：Python 麻木的二进制矢量化

上一篇：我可以使用從復合陳述句獲得的布爾陣列過濾陣列嗎？

下一篇：像素位置值轉換輸出不正確