如何列舉過濾重復的組合 -有解無憂

我有一個可能的選擇串列。[[1], [2, 4], [4], [5, 6, 2], [5, 3]]/code>.

我想列出所有的組合，從每個子串列中最多抽取一個元素，不重復元素。
所以[1, 2, 4, 5, 3]是一個有效的選項。但是[1, 4, 4, 5, 3]則不是。我允許不在任何子串列中進行選擇，所以[1,4, None,5,3]是有效的，就像[1, None, None, None]，以及[None, None, None, None]。
我不能簡單地列舉所有的組合，然后過濾掉我不想要的組合，因為對于一個大的可能的選擇串列，這將很快變得計算上不可行（在我的專案中，我正在尋找25^25的最大組合）。
編輯：我還會對結果應用一些額外的標準，例如過濾不超過None選項的閾值，或者按照None選項最少的組合順序對結果的組合串列進行排序。
編輯：附上真實案例的細節。我想把它應用于一個由25個子串列組成的串列，每個串列可以有1-25個元素。現實中，每個子串列最多有 15 個元素，平均有 2-4 個。
因此，list(itertools.product(*choices))然后進行過濾的簡單解決方案被淘汰。
我可能還希望在組合串列中添加其他的過濾條件，所以最好能在前面過濾這些條件。
我試著以遞回方式構建一棵樹，例如根節點擁有完整的選擇串列，子節點做出第一個選擇[1]，并擁有一個更新的選擇串列，其中'1'被從所有串列[1:]中洗掉。
雖然在實作遞回方面遇到了困難。
你能幫助我找到其他方法嗎？
uj5u.com熱心網友回復：
另一種以最小的記憶體使用量生成所有有效輸出的方法是迭代元素而不是迭代串列。使用深度優先搜索，這樣你只能從一開始就生成有效輸出。這意味著我們需要在DFS的每一層中跟蹤三件事：當前可能要添加的元素、已經使用的串列以及我們使用之前那些串列的順序。
為了幫助我們進行搜索，我們通過將每個元素映射到它可能出現的串列集來對選擇進行預處理，這創造了一種雙版本的問題。這也會按照 "最大非空選擇優先 "的順序生成串列。
由于你已經指定唯一元素的數量 N 等于串列的數量，這種方法的運行時間為 O(N * |output|)，并且我們到處使用生成器以節省記憶體。
import collections
from typing import Dict, Generator, List, Optional, Set

choices = [[1], [2, 4], [4] 。[5, 6, 2], [5, 3]
element_to_containing_lists: Dict[int, Set[int]] = collections.defaultdict（set）。

for i, choice_list in enumerate(options):
    for x  in choice_list:
        element_to_containing_lists[x].add(i)

all_unique_elements = sorted(Element_to_containing_lists)


def dfs（used_list_indices: Set[int] 。
        next_element_index: int。
        used_list_ordered_indices: List[Optional[int]]/span>) -> Generator[List[Optional[int]] ]。
    if next_element_index == len（all_unique_elements）。
        yield used_list_ordered_indices
    else:
        # 如果我們使用該元素，找到一個未使用的串列索引。
        for possible_list_to_use in element_to_containing_lists[
                                        all_unique_elements[next_element_index]] - used_list_indices:
            yield from dfs（used_list_indices | {possible_list_to_use},
                           next_element_index   1,
                           used_list_ordered_indices   [possible_list_to_use])

        # 如果我們不使用該元素。添加None作為哨兵值。
        yield from dfs（used_list_indices,
                       next_element_index   1,
                       used_list_ordered_indices   [None])


for element_to_used_list in dfs(set(), 0, [] ）。)
    list_to_chosen_element = ['N'] * len(choice)
    for x, y in zip（all_unique_elements, element_to_used_list）。
        if y is not None。
            list_to_chosen_element[y] = x
    print(*list_to_chosen_element, sep=' ')

輸出的前10行:
1  2  4  5  3
1 2 4 6 3
1 2 4 N 3
1 2 N 5 3 N
1 2 N 6 3
1 2 N N 3
1 2 4 5 N
1 2 4 6 5 N
1 2 4 N 5

通過對 "已用串列 "使用一個位元掩碼而不是一組索引，這可能被優化為在O（|output|）時間內運行。
uj5u.com熱心網友回復：
不要把結果變成一個串列。product是一個生成器。利用這一點對你有利。filter也是一個生成器。通過這種方式，你只能在記憶體中保留一個組合。有時，product的單個輸出會在你看不到的情況下被內部丟棄，但它不會占用任何額外的空間。
def criterion（x）。
    k = [i for i in x if i  is not None]
    return len(k) == len(set(k) )

choices = [[1], [2, 4], [4] 。[5, 6, 2], [5, 3] ]
for c in choices:
    c.append(None)
filter(criterion, product(*choices) )

uj5u.com熱心網友回復：
與之前的答案非常相似。 沒有那么快，但我認為它更容易理解。 一個簡單的遞回實作。
def pick_all（choices）。
    # 對一個空的選擇串列的結果只是一個空的串列。
    if not choices:
        yield [] 。
        return return
    # Pluck off the first choice[/span]。
    first_choice, *remainder = choices

    # 遞回地找到所有小問題的解決方案。
    for result in pick_all( remainder):
        # We can always add None to the front[/span].
        yield [None, *result]
        for item in first_choice:
            # 而我們可以在這個選擇中添加任何尚未存在的專案。
            if item not in result:
                yield [item, *result] 。

uj5u.com熱心網友回復：
[set(dataset) for datasetin itertools.product(*datasets)]

創建一個由sets組成的組合串列。集合就像串列，但會自動洗掉現有的值。這段代碼沒有確保集合的長度是5，注意集合也是無序的。

 

print([dataset for dataset in itertools. product(*datasets) if len(set(dataset)) == len(dataset)] )

組合串列，并通過確保一組資料與原始串列的長度相同來過濾掉任何有重復值的串列。
datasets = [[1], [2, 4], [4], [5, 6, 2], [5, 3] ]
datasets = [[*dataset, None] for dataset  in datasets]

new_dataset = []
for dataset in itertools.product(*datasets) 。
    without_nones = [n for n in 資料集 if n is not None ]
    if len(without_nones) == len(set(without_nones) )。
        new_dataset.append(dataset)

print(new_dataset)

創建一個組合串列，其長度始終為5，沒有重復的數字，并且可以有無限的Nones。
這可能應該是一個生成器，而且我懷疑有一個更簡單的方法來處理你需要這種格式的串列。






        
      轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/319290.html
      標籤：
      上一篇：使用GUID的RSA加密和解密
下一篇：在Sql中使用遞回來查找事件的位置