在優于O(K*lgN)運行時間的情況下反轉保序最小完美散列函式-有解無憂

我試圖找到比我已經找到的解決方案更有效的組合問題解決方案。

假設我有一組N 個物件（索引為0..N-1）并希望考慮每個大小為K 的子集（0<=K<=N）。有S=C(N,K)（即“N 選擇 K”）這樣的子集。我希望將每個這樣的子集映射（或“編碼”）到0..S-1范圍內的唯一整數。

以N=7（即索引為0..6）和K=4（S=35）為例，下面的映射是目標：
0 1 2 3 --> 0
0 1 2 4 --> 1
...
2 4 5 6 --> 33
3 4 5 6 --> 34

出于說明的目的，N和K被選得很小。但是，在我的實際應用中，C(N,K)太大而無法從查找表中獲得這些映射。它們必須即時計算。

在下面的代碼中，combinations_table是一個預先計算的二維陣列，用于快速查找C(N,K)值。

給出的所有代碼都符合C 14標準。

如果子集中的物件按其索引的遞增順序排序，以下代碼將計算該子集的編碼：

template<typename T, typename T::value_type N1, typename T::value_type K1>
typename T::value_type combination_encoder_t<T, N1, K1>::encode(const T &indexes)
{
   auto offset{combinations_table[N1][K1] - combinations_table[N1 - indexes[0]][K1]};

   for (typename T::value_type index{1}; index < K1;   index)
   {
      auto offset_due_to_current_index{
           combinations_table[N1 - (indexes[index-1]   1)][K1 - index] -
           combinations_table[N1 - indexes[index]][K1 - index]
                                      };

      offset  = offset_due_to_current_index;
   }

   return offset;
}

在這里，模板引數T將是一個std::array<>或std::vector<>持有我們希望為其找到編碼的索引集合。

這本質上是一個“保持順序的最小完美哈希函式”，可以在這里閱讀：在優于 O(K*lg N) 運行時間的情況下反轉保序最小完美散列函式

假設您有一個組合空間C(n,k)。您可以將該空間劃分為兩個子空間：

C(n-1,k-1)所有組合，其中存在原始集合（長度n）的第一個元素
C(n-1, k) 其中第一個元素未預設

如果您有一個索引 X 對應于來自的組合C(n,k)，您可以確定原始集合的第一個元素是否屬于子集（對應于X），如果您檢查是否X屬于任一子空間：

X < C(n-1, k-1) : 屬于
X >= C(n-1, k-1): 不屬于

然后你可以遞回地應用相同的方法C(n-1, ...)等等，直到你找到n原始集合中所有元素的答案。

用于說明此方法的 Python 代碼：

import itertools, math

n=7
k=4
stuff = list(range(n))

# function that maps x into the corresponding combination
def rec(x, n, k, index):
  if n==0 and k == 0:
    return index

  # C(n,k) = C(n-1,k-1)   C(n-1, k)
  # C(n,0) = C(n,n) = 1
  c = math.comb(n-1, k-1) if k > 0 else 0
  if x < c:
    index.add(stuff[len(stuff)-n])
    return rec(x, n-1, k-1, index)
  else:
    return rec(x - c, n-1, k, index)

# Test:
for i,eta in enumerate(itertools.combinations(stuff, k)):
  comb = rec(i, n, k, set())
  print(f'{i} {eta} {comb}')

產生的輸出：

0 (0, 1, 2, 3) {0, 1, 2, 3}
1 (0, 1, 2, 4) {0, 1, 2, 4}
2 (0, 1, 2, 5) {0, 1, 2, 5}
3 (0, 1, 2, 6) {0, 1, 2, 6}
4 (0, 1, 3, 4) {0, 1, 3, 4}
5 (0, 1, 3, 5) {0, 1, 3, 5}
...
33 (2, 4, 5, 6) {2, 4, 5, 6}
34 (3, 4, 5, 6) {3, 4, 5, 6}

這種方法是O(n)（而您的方法似乎是O( k * log(n) )(?) ），如果迭代重寫，它應該具有相當小的常數。我不確定它是否會產生改進（需要測驗）。

I also wonder how large your typical k and n values are? I assume they should be small enough so that C(n,k) still fits into 64bits?

Of course, you can use precomputed tables instead of math.comb, replace recursion with iteration (it's tail recursion, so you don't need stack), and use array instead of the set for the result.

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/329071.html

標籤：c algorithm performance c 14 combinations

上一篇：計算精彩子串的數量

下一篇：如何根據自定義條件過濾串列？