在給定條件的串列中查找所有對的最佳方法是什么？-有解無憂

假設我有一個串列['ab','cd','ac','de']，我想找到兩個元素都沒有任何重復字母的所有可能對。

答案是['ab','cd'],['ab','de'],['ac','de']。

但我不知道解決這個問題的最佳方法/演算法。目前我正在做的是使用嵌套for回圈來檢查每個元素，然后將其洗掉以縮短串列，但是當我的串列有 1000 行長并且元素更復雜時，它開始花費大量時間。

arr4 = ['ab','cd','ac','de']
for i in arr4:
    for x in arr4:
        if not bool(set(i)&set(x)):
            print('[' i ',' x ']')
    print('----')
    arr4.remove(i)

uj5u.com熱心網友回復：

就基本演算法而言，您的解決方案還不錯，但您不應該像您那樣修改您正在迭代的串列。此外，還可以進行一些優化：

from random import sample
from timeit import timeit
from itertools import combinations


size = 3  # length of the strings
chars = [chr(n) for n in range(97, 123)]
strings = {''.join(sample(chars, size)) for _ in range(1000)}


# your solution as a one-liner
def pair_strs1(ps):
    return [(p, q) for p in ps for q in ps if not (set(p) & set(q))]


# this doesn't work, unless there's no initial pairs with double letters like 'aa'
def pair_strs2(ps):
    return [(p, q) for p in ps for q in ps if len(set(p q)) == 4]


# your solution, but compute the sets only once, then look them up before combining
def pair_strs3(ps):
    ps = {p: set(p) for p in ps}
    return [(p, q) for p in ps for q in ps if not (ps[p] & ps[q])]


# same, but avoiding mirrored pairs, only compute above the diagonal of the product
def pair_strs4(ps):
    ps = {p: set(p) for p in ps}
    keys = list(ps.keys())
    return [(p, q) for i, p in enumerate(ps) for q in keys[i 1:] if not (ps[p] & ps[q])]


# the same again, but now avoiding the lookup in the dictionaries
def pair_strs5(ps):
    ps = {p: set(p) for p in ps}
    items = list(ps.items())
    return [(p, q) for i, (p, p_set) in enumerate(ps.items()) for q, q_set in items[i 1:] if not (p_set & q_set)]


# different approach, creating the product (upper half above diagonal) first
def pair_strs6(ps):
    ps = {p: set(p) for p in ps}
    ps = combinations(ps.items(), 2)
    return [(p, q) for (p, p_set), (q, q_set) in ps if not (p_set & q_set)]


# the same as 5, but with integer bitmasks instead of sets, as per @kellybundy's suggestion
def pair_strs7(ps):
    ps = {p: sum(1 << (ord(c) - 97) for c in p) for p in ps}
    items = list(ps.items())
    return [(p, q) for i, (p, p_mask) in enumerate(ps.items()) for q, q_mask in items[i 1:] if not (p_mask & q_mask)]


# run some tests
n = 10
print(timeit(lambda: pair_strs1(strings), number=n))
print(timeit(lambda: pair_strs3(strings), number=n))
print(timeit(lambda: pair_strs4(strings), number=n))
print(timeit(lambda: pair_strs5(strings), number=n))
print(timeit(lambda: pair_strs6(strings), number=n))
print(timeit(lambda: pair_strs7(strings), number=n))


p1 = pair_strs1(strings)
p3 = pair_strs3(strings)
p4 = pair_strs4(strings)
p4  = [tuple(reversed(p)) for p in p4]
p5 = pair_strs5(strings)
p5  = [tuple(reversed(p)) for p in p5]
p6 = pair_strs6(strings)
p6  = [tuple(reversed(p)) for p in p6]
p7 = pair_strs7(strings)
p7  = [tuple(reversed(p)) for p in p7]

print(set(p1) == set(p3) == set(p4) == set(p5) == set(p6) == set(p7) and
      len(p1) == len(p3) == len(p4) == len(p5) == len(p6) == len(p7))

結果對我來說：

3.4115294000002905
1.7107413000012457
0.8492871999987983
0.7485178000024462
0.7776397999987239
0.37149049999788986
True

所以，最后的改變并沒有真正的幫助——事先計算產品并不能節省足夠的錢。pair_pairs5似乎更可取。

編輯：將字串的生成更改為包含大小（并將默認值設定為 3 而不是 2），相應地調整命名并避免字串中的重復字符（根據 OP 的描述）。

這也使用戶@KellyBundy 的解決方案最快，因為沒有重復項允許非常干凈的基于整數的解決方案#7。在這些條件下，這是明顯的贏家（大約 9 倍加速）。

uj5u.com熱心網友回復：

我認為您可以跟蹤每個陣列元素的字母集。然后，存盤反向對（例如，（x，i））不再訪問該條目，如下所示：

if __name__ == '__main__':
    arr4 = ['ab','cd','ac','de']
    string_dict = {}
    for a_str in arr4:
        if a_str not in string_dict:
            string_dict[a_str] = set()
        for a_letter in a_str:
            string_dict[a_str].add(a_letter)
    print(string_dict)
    visited = set()
    for i in arr4:
        for x in arr4:
            if not (string_dict[i] & string_dict[x]):
                cur_pair = (i, x)
                if cur_pair in visited:
                    continue
                print(cur_pair)
                reverse_pair = (x, i)
                visited.add(cur_pair)
                visited.add(reverse_pair)

結果：

{'ab': {'a', 'b'}, 'cd': {'c', 'd'}, 'ac': {'a', 'c'}, 'de': {'d', 'e'}}
('ab', 'cd')
('ab', 'de')
('ac', 'de')

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/452500.html

標籤：Python 算法

上一篇：如何在影片曲線上使用二進制搜索來查找時間給定值？

下一篇：使用Streams和BigInteger計算斐波那契數