如何在numpy陣列列中查找數字序列-有解無憂

我有一個 numpy 陣列（形狀：10x2），如下所示：

              array
index      label feature
  0          121    a
  1          131    b 
  2          113    c
  3          131    d
  4          223    e
  5          242    f
  6          212    g 
  7          131    h
  8          113    i
  9          131    j

我希望能夠找到與某個序列匹配的索引，獲取與該序列對應的特征串列中的專案，

例如，給定序列 [131,113,131]，我會找到索引 1 和 7（起始索引）或對應于序列（[1,2,3] 和 [7,8,9]）的索引串列和然后最終得到對應序列的特征：[b,c,d]和[h,i,j]。

我當前的解決方案如下，并為我提供了序列的起始索引，但它不是很容易推廣到更長的序列，并且有點難以理解

import numpy as np

v = np.array([[121,1],
         [131,1],
         [113,1],
         [131,1],
         [223,1],
         [242,1],
         [212,1],
         [131,1],
         [113,1],
         [131,1]])

sequence = [131,113,131]

c = [ind for ind, x in enumerate(v[:,0]) if (ind 1 < len(v[:,0]) and ind 2 < len(v[:,0])) if (x == sequence[0] and v[:,0][ind 1] == sequence[1] and v[:,0][ind 2] == sequence[2])]

我更喜歡僅使用 numpy 的解決方案，因為我僅限于舊系統，該系統具有腳本的其他部分所需的一些過時的自定義包，但歡迎在 pandas 或任何其他包中看到它。我認為這是一種模板匹配問題，但似乎找不到優雅的解決方案。先感謝您！

uj5u.com熱心網友回復：

我得到了你的結果np.where，不得不使用reduceonnp.roll來組合條件。這將找到子系列的第一個索引。然后要獲得功能，只需調整結果大小并將其添加到您需要查找的長度范圍內，就是這樣：

from operator import and_
from functools import reduce

a = np.array([ ... ])
find = [131, 113, 131]

indices = np.where(reduce(and_ , ((np.roll(a['label'], -r) == i) for r, i in enumerate(find))))[0]
result = a['feature'][np.arange(len(find))   np.resize(indices, (indices.size, 1))])

[['b' 'c' 'd']
 ['h' 'i' 'j']]

我假設您使用的是 python 2 并且我的計算機上沒有它，但是如果是這樣，請洗掉 reduce 的匯入，并且可能需要將理解取出并使其成為自己的回圈。雖然這確實適用于 python 3.x。

uj5u.com熱心網友回復：

一個 numpy 唯一的選擇。步驟和輸出解釋了流程。

import numpy as np

v = np.array([[121,1],
         [131,1],
         [113,1],
         [131,1],
         [223,1],
         [242,1],
         [212,1],
         [131,1],
         [113,1],
         [131,1]])

# converted to np array
sequence = np.array([131,113,131])

print()
print("# Find starting points of seq in array")
print("v[:,0] = ", v[:,0])
print("v[:,0] == sequence[0] = ", v[:,0] == sequence[0])
start_pos = np.where(v[:,0] == sequence[0])[0]
print("result", start_pos)

print()
print("# Drop all indexes which can give index error")
print("initial", start_pos)
seq_len = sequence.shape[0]
max_possible_idx = v.shape[0]-sequence.shape[0]
start_pos = start_pos[start_pos <= max_possible_idx]
print("result", start_pos)

print()
print("# Generate index sequences to be matched")
idx_seq = numpy.arange(seq_len).reshape(seq_len,1)
m = np.tile(idx_seq, (1, start_pos.shape[0]))
idx_mat = m start_pos
print("result \n", idx_mat) # read them column wise

print()
print("# Compare values from each index sequence with given sequence")
bools = np.apply_along_axis(lambda x: v[:,0][x] == sequence, 0, idx_mat)
print(bools)
print(bools.all(0))
print(start_pos[bools.all(0)])

輸出：

# Find starting points of seq in array
v[:,0] =  [121 131 113 131 223 242 212 131 113 131]
v[:,0] == sequence[0] =  [False  True False  True False False False  True False  True]
result [1 3 7 9]

# Drop all indexes which can give index error
initial [1 3 7 9]
result [1 3 7]

# Generate index sequences to be matched
result 
 [[1 3 7]
 [2 4 8]
 [3 5 9]]

# Compare values from each index sequence with given sequence
[[ True  True  True]
 [ True False  True]
 [ True False  True]]
[ True False  True]
[1 7]

這可以通過使用更多的高階函式來進一步改進，但總體思路是：

找到sequencein的第一個元素的所有位置v
生成索引矩陣，每一串列示要匹配的順序索引。
匹配每個索引序列生成的每個切片從v到sequence

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/310945.html

標籤：Python 麻木的

上一篇：“dict”物件沒有屬性“META”重定向錯誤

下一篇：Mysql的重復資料