問題:
獲取陣列中邊界索引的計算效率最高的解決方案,其中邊界的開始總是以特定數字開頭,非邊界由不同的特定數字表示。
這個問題與 SO 上其他基于邊界的 numpy 問題之間的區別:
這是其他一些基于邊界的 numpy 問題
Numpy 1D array - 查找相同數字的子序列的邊界索引
用孔獲取numpy陣列形狀的邊界
提取numpy陣列的邊界
我在嘗試尋找解決方案時提出的問題與其他 stackoverflow 帖子之間的區別在于,其他邊界由值的跳躍或值的“洞”指示。
我的情況似乎是獨一無二的,邊界的開始總是以特定的數字開頭。
動機:
這個問題的靈感來自自然語言處理中的 IOB 標記。在 IOB 標記中,單詞的開頭標記為 B [beginning] 是物體中第一個字母的標記,I [inside] 是單詞中除第一個字符之外的所有其他字符的標記,[O]用于標記所有非物體字符
例子:
import numpy as np
a = np.array(
[
0, 0, 0, 1, 2, 2, 2, 0, 0, 0, 1, 0, 0, 1, 2, 1, 1, 0, 0, 1, 1, 1
]
)
1 是每個邊界的起點。如果邊界的長度大于 1,則 2 構成邊界的其余部分。0 是無邊界數。
The entities of these boundaries are 1, 2, 2, 2, 1, 1,2, 1, 1, 1, 1, 1
So the desired solution; the indices of the indices boundary values for a are
desired = [[3, 6], [10, 10], [13, 14], [15, 15], [16,16], [19,19], [20,20], [21,21]]
Current Solution:
If flattened, the numbers in the desired solution are in ascending order. So the raw indices numbers can be calculated, sorted, and reshaped later.
I can get the start indices using
starts = np.where(a==1)[0]
starts
array([ 3, 10, 13, 15, 16, 19, 20, 21])
So what's left is 6, 10, 14, 15,16,19,20,21
I can get all except 1 using 3 different conditionals where I can compare a shifted array to the original by decreases in values and the values of the non-shifted array.
first = np.where(a[:-1] - 2 == a[1:])[0]
first
array([6])
second = np.where((a[:-1] - 1 == a[1:]) &
((a[1:]==1) | (a[1:]==0)))[0]
second
array([10, 14, 16])
third = np.where(
(a[:-1] == a[1:]) &
(a[1:]==1)
)[0]
third
array([15, 19, 20])
The last number I need is 21, but since I needed to shorten the length of the array by 1 to do the shifted comparisons, I'm not sure how to get that particular value using logic, so I just used a simple if statement for that.
Using the rest of the retrieved values for the indices, I can concatenate all the values and reshape them.
if (a[-1] == 1) | (a[-1] == 2):
pen = np.concatenate((
starts, first, second, third, np.array([a.shape[0]-1])
))
else:
pen = np.concatenate((
starts, first, second, third,
))
np.sort(pen).reshape(-1,2)
array([[ 3, 6],
[10, 10],
[13, 14],
[15, 15],
[16, 16],
[19, 19],
[20, 20],
[21, 21]])
Is this the most computationally efficient solution for my answer? I realize the four where statements can be combined with or operators but wanted to have each separate for the reader to see each result in this post. But I am wondering if there is a more computationally efficient solution since I have not mastered all of numpy's functions and am unsure of the computational efficiency of each.
uj5u.com熱心網友回復:
此類問題的標準技巧是適當地填充輸入。在這種情況下,將 a 附加0到陣列的末尾會很有幫助:
In [55]: a1 = np.concatenate((a, [0]))
In [56]: a1
Out[56]:
array([0, 0, 0, 1, 2, 2, 2, 0, 0, 0, 1, 0, 0, 1, 2, 1, 1, 0, 0, 1, 1, 1,
0])
那么你的starts計算仍然有效:
In [57]: starts = np.where(a1 == 1)[0]
In [58]: starts
Out[58]: array([ 3, 10, 13, 15, 16, 19, 20, 21])
結束的條件是該值是 a1或 a2后跟一個不是 的值2。您已經發現要處理“跟隨”條件,您可以使用陣列的移位版本。要實作andandor條件,請分別使用按位二元運算子&and |。在代碼中,它看起來像:
In [61]: ends = np.where((a1[:-1] != 0) & (a1[1:] != 2))[0]
In [62]: ends
Out[62]: array([ 6, 10, 14, 15, 16, 19, 20, 21])
最后,將starts和ends放入一個陣列中:
In [63]: np.column_stack((starts, ends))
Out[63]:
array([[ 3, 6],
[10, 10],
[13, 14],
[15, 15],
[16, 16],
[19, 19],
[20, 20],
[21, 21]])
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/436268.html
