我有兩個組串列,串列元素格式為 [name, group_id]:
lst1 = [
['apple', 1],
['banana', 1],
['orange', 1],
['123', 2],
['456', 2],
['abc', 3],
['ABC', 3],
['tony', 4],
['john', 4],
['jack', 4],
]
lst2 = [
['!@#', 1],
['apple', 2],
['banana', 2],
['strawberry', 2],
['lemon', 2],
['john', 3],
['tony', 3],
['adella', 3],
]
我想通過名稱的交集合并 2 個串列,這意味著合并 2 個組,如果它們具有最大的共同值(最終 group_id 并不重要)。結果如下:
lst = [
['apple', 1],
['banana', 1],
['orange', 1],
['strawberry', 1],
['lemon', 1],
['!@#', 2],
['john', 3],
['tony', 3],
['adella', 3],
['jack', 3],
['123', 4],
['456', 4],
['abc', 5],
['ABC', 5],
]
我怎樣才能有效地做到這一點?
uj5u.com熱心網友回復:
這是一個有效的解決方案。它不是最優的 (O(n**2)),因為它需要將第一個串列的所有元素與第二個串列的所有元素進行比較。我希望有人想出一個更好的演算法,但與此同時:
from itertools import groupby
# group elements with common id and transform to set
def to_set(l):
return [set(e[0] for e in g)
for k,g in groupby(l, key=lambda x: x[1])]
# find first element of set_list that overlaps s1
def match_set(s1, set_list):
for s2 in set_list:
if len(s1.intersection(s2)) > 0:
return s1.union(s2)
return s1
sets1 = to_set(lst1)
sets2 = to_set(lst2)
# perform merge both ways (to have "outer join")
out = {tuple(sorted(match_set(s1, sets2))) for s1 in sets1}
out = out.union({tuple(sorted(match_set(s2, out))) for s2 in sets2})
# annotate with new group
out = [[v, i] for i,t in enumerate(out) for v in t]
輸出:
[['apple', 0],
['banana', 0],
['lemon', 0],
['orange', 0],
['strawberry', 0],
['!@#', 1],
['123', 2],
['456', 2],
['ABC', 3],
['abc', 3],
['adella', 4],
['jack', 4],
['john', 4],
['tony', 4]]
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/336738.html
