Numpy：獲取陣列中邊界的索引，其中邊界的開始總是以特定的數字開始；特定數量的無邊界-有解無憂

問題：

獲取陣列中邊界索引的計算效率最高的解決方案，其中邊界的開始總是以特定數字開頭，非邊界由不同的特定數字表示。

這個問題與 SO 上其他基于邊界的 numpy 問題之間的區別：

這是其他一些基于邊界的 numpy 問題

Numpy 1D array - 查找相同數字的子序列的邊界索引

用孔獲取numpy陣列形狀的邊界

提取numpy陣列的邊界

我在嘗試尋找解決方案時提出的問題與其他 stackoverflow 帖子之間的區別在于，其他邊界由值的跳躍或值的“洞”指示。

我的情況似乎是獨一無二的，邊界的開始總是以特定的數字開頭。

動機：

這個問題的靈感來自自然語言處理中的 IOB 標記。在 IOB 標記中，單詞的開頭標記為 B [beginning] 是物體中第一個字母的標記，I [inside] 是單詞中除第一個字符之外的所有其他字符的標記，[O]用于標記所有非物體字符

例子：

import numpy as np

a = np.array(
    [
     0, 0, 0, 1, 2, 2, 2, 0, 0, 0, 1, 0, 0, 1, 2, 1, 1, 0, 0, 1, 1, 1
    ]
)

1 是每個邊界的起點。如果邊界的長度大于 1，則 2 構成邊界的其余部分。0 是無邊界數。

The entities of these boundaries are 1, 2, 2, 2, 1, 1,2, 1, 1, 1, 1, 1

So the desired solution; the indices of the indices boundary values for a are

desired = [[3, 6], [10, 10], [13, 14], [15, 15], [16,16], [19,19], [20,20], [21,21]]

Current Solution:

If flattened, the numbers in the desired solution are in ascending order. So the raw indices numbers can be calculated, sorted, and reshaped later.

I can get the start indices using

starts = np.where(a==1)[0]
starts

array([ 3, 10, 13, 15, 16, 19, 20, 21])

So what's left is 6, 10, 14, 15,16,19,20,21

I can get all except 1 using 3 different conditionals where I can compare a shifted array to the original by decreases in values and the values of the non-shifted array.

first = np.where(a[:-1] - 2 == a[1:])[0]
first

array([6])

second = np.where((a[:-1] - 1 == a[1:]) & 
    ((a[1:]==1) | (a[1:]==0)))[0]
second

array([10, 14, 16])

third = np.where(
    (a[:-1] == a[1:]) &
    (a[1:]==1)
    )[0]
third

array([15, 19, 20])

The last number I need is 21, but since I needed to shorten the length of the array by 1 to do the shifted comparisons, I'm not sure how to get that particular value using logic, so I just used a simple if statement for that.

Using the rest of the retrieved values for the indices, I can concatenate all the values and reshape them.

if (a[-1] == 1) | (a[-1] == 2):
    pen = np.concatenate((
        starts, first, second, third, np.array([a.shape[0]-1])
    ))
else:
    pen = np.concatenate((
        starts, first, second, third, 
    ))
np.sort(pen).reshape(-1,2)

array([[ 3,  6],
       [10, 10],
       [13, 14],
       [15, 15],
       [16, 16],
       [19, 19],
       [20, 20],
       [21, 21]])

Is this the most computationally efficient solution for my answer? I realize the four where statements can be combined with or operators but wanted to have each separate for the reader to see each result in this post. But I am wondering if there is a more computationally efficient solution since I have not mastered all of numpy's functions and am unsure of the computational efficiency of each.

uj5u.com熱心網友回復：

此類問題的標準技巧是適當地填充輸入。在這種情況下，將 a 附加0到陣列的末尾會很有幫助：

In [55]: a1 = np.concatenate((a, [0]))

In [56]: a1
Out[56]: 
array([0, 0, 0, 1, 2, 2, 2, 0, 0, 0, 1, 0, 0, 1, 2, 1, 1, 0, 0, 1, 1, 1,
       0])

那么你的starts計算仍然有效：

In [57]: starts = np.where(a1 == 1)[0]

In [58]: starts
Out[58]: array([ 3, 10, 13, 15, 16, 19, 20, 21])

結束的條件是該值是 a1或 a2后跟一個不是的值2。您已經發現要處理“跟隨”條件，您可以使用陣列的移位版本。要實作andandor條件，請分別使用按位二元運算子&and |。在代碼中，它看起來像：

In [61]: ends = np.where((a1[:-1] != 0) & (a1[1:] != 2))[0]

In [62]: ends
Out[62]: array([ 6, 10, 14, 15, 16, 19, 20, 21])

最后，將starts和ends放入一個陣列中：

In [63]: np.column_stack((starts, ends))
Out[63]: 
array([[ 3,  6],
       [10, 10],
       [13, 14],
       [15, 15],
       [16, 16],
       [19, 19],
       [20, 20],
       [21, 21]])

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/436268.html

標籤：arrays numpy

上一篇：Python：兩個同心圓-檢查是否在里面

下一篇：使用RestTemplateGET請求拋出400BadRequest