在不使用嵌套回圈的情況下查找串列中所有對的有效方法-有解無憂

假設我有一個存盤許多 2D 點的串列。在這個串列中，一些位置存盤了相同的點，將存盤相同點的位置的索引視為索引對。我想找到串列中的所有對并回傳所有 2 x 2 索引對。串列中可能有一些點重復了兩次以上，但只有第一個匹配項需要被視為一對。

例如，在下面的串列中，我總共有 9 個點，并且有 5 個位置包含重復點。索引 0、3 和 7 存盤相同的點 ( [1, 1])，索引 1 和 6 存盤相同的點 ( [2, 3])。

[[1, 1], [2, 3], [1, 4], [1, 1], [10, 3], [5, 2], [2, 3], [1, 1], [3, 4]]

所以，對于這個串列，我想將索引對回傳為（索引 0，索引 3）和（索引 1，索引 6）。我能想到的唯一解決方案是通過嵌套回圈，我將其編碼如下

A = np.array([[1, 1], [2, 3], [1, 4], [1, 1], [10, 3], [5, 2], [2, 3], [1, 1], [3, 4]], dtype=int)

# I don't want to modified the original list, looping through a index list insted.
Index = np.arange(0, A.shape[0], 1, dtype=int) 
Pair = [] # for store the index pair
while Index.size != 0:

    current_index = Index[0]
    pi = A[current_index]
    Index = np.delete(Index, 0, 0)

    for j in range(Index.shape[0]):
        pj = A[Index[j]]
        distance = linalg.norm(pi - pj, ord=2, keepdims=True)
        
        if distance == 0:
            Pair.append([current_index, Index[j]])
            Index = np.delete(Index, j, 0)
            break

雖然這段代碼對我有用，但時間復雜度是O(n^2), where n == len(A)，我想知道是否有更有效的方法可以以更低的時間復雜度完成這項作業。感謝您的任何想法和幫助。

uj5u.com熱心網友回復：

您可以使用字典來跟蹤每個點的索引。

然后，您可以遍歷字典中的專案，列印出與多次出現的點相對應的索引。這個程序的運行時間是線性的，而不是二次的，在中的點數A：

points = {}

for index, point in enumerate(A):
    point_tuple = tuple(point)
    if point_tuple not in points:
        points[point_tuple] = []
    points[point_tuple].append(index)

for point, indices in points.items():
    if len(indices) > 1:
        print(indices)

這列印出來：

[0, 3, 7]
[1, 6]

如果您只想要出現點的前兩個索引，則可以使用print(indices[:2])而不是print(indices).

uj5u.com熱心網友回復：

這與其他答案類似，但由于在多對的情況下您只需要前兩個，您可以在一次迭代中完成。在 dict 中的適當鍵下添加索引并在（且僅當）有兩點時產生索引：

from collections import defaultdict

l = [[1, 1], [2, 3], [1, 4], [1, 1], [10, 3], [5, 2], [2, 3], [1, 1], [3, 4]]

def get_pairs(l):
    ind = defaultdict(list)

    for i, pair in enumerate(l):
        t = tuple(pair)
        ind[t].append(i)
        if len(ind[t]) == 2:
            yield list(ind[t])

list(get_pairs(l))
# [[0, 3], [1, 6]]

uj5u.com熱心網友回復：

一個沒有回圈的純 Numpy 解決方案（迄今為止唯一的一個）是使用np.unique兩次，其中包括洗掉在兩次搜索之間找到的第一個專案。該解決方案假設可以設定一個標記（例如，-1，整數的最小值，NaN），這通常不是問題（如果需要，您可以使用更大的型別）。

A = np.array([[1, 1], [2, 3], [1, 4], [1, 1], [10, 3], [5, 2], [2, 3], [1, 1], [3, 4]], dtype=int)

# Copy the array not to mutate it
tmp = A.copy()

# Find the location of unique values
pair1, index1 = np.unique(tmp, return_index=True, axis=0)

# Discard the element found assuming -1 is never stored in A
INT_MIN = np.iinfo(A.dtype).min
tmp[index1] = INT_MIN

# Find the location of duplicated values
pair2, index2 = np.unique(tmp, return_index=True, axis=0)

# Extract the indices that share the same pair of values found
left = index1[np.isin(pair1, pair2).all(axis=1)]
right = index2[np.isin(pair2, pair1).all(axis=1)]

# Combine the each left index with each right index
result = np.hstack((left[:,None], right[:,None]))

# result = array([[0, 3],
#                 [1, 6]])

該解決方案應及時運行，O(n log n)因為在np.unique內部使用基本排序（更具體地說是快速排序）。

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/447867.html

標籤：Python 列表表现搜索

上一篇：在python中使用多處理工具的方法比沒有它的方法效果更差

下一篇：回圈遍歷列以檢查SQLServer中的強制轉換是否成功