在累積屬性計數的同時獲取唯一字典串列-有解無憂

  duplicate_array= [
        {'id': 1, 'name': 'john', 'count': 1},
        {'id': 1, 'name': 'john', 'count': 2},
        {'id': 2, 'name': 'peter', 'count': 1},
    ]

如何獲取唯一字典串列，在累積重復項的“計數”時洗掉重復項？

[
    {'id': 1, 'name': 'john', 'count': 3},   //here is main use case that I want to get total count 
    {'id': 2, 'name': 'peter', 'count': 1},
]

我嘗試這樣做以獲得唯一值，但不確定如何累積結果？

final = list({v['id']:v for v in duplicate_array}.values())

uj5u.com熱心網友回復：

這是一些不使用任何 python 庫的代碼。但是，這確實會導致代碼更長。

duplicate_array= [
    {'id': 1, 'name': 'john', 'count': 1},
    {'id': 1, 'name': 'john', 'count': 2},
    {'id': 2, 'name': 'peter', 'count': 1},
]
final=[]

for i, x in enumerate(duplicate_array):
    count = 0
    
    for d in duplicate_array.copy():
        if d != 0 and d["id"] == x["id"] and d["name"] == x["name"]:
            count  = d["count"]
            duplicate_array.remove(d)
            
    duplicate_array.insert(i, 0)
    x["count"] = count
    final.append(x)

在第一個代碼塊中，我們定義了原始串列并初始化了我們的輸出串列。

然后我們有for回圈。

首先，我們將 count 初始化為 0。然后我們再次遍歷串列以查找與當前字典具有相同 id 和 name 的所有字典。如果他們這樣做了，我們將 count 與他們的計數值相加并將它們從串列中洗掉。我們還檢查字典是否非零，因為稍后我們將向陣列添加零。這可以防止程式崩潰。

我們在串列的當前位置插入一個零，以防止python跳過下一項。For 回圈為它們在 python 中的哪個專案保留一個計數器。但是，當我們洗掉當前專案（我們在嵌套的 for 回圈中所做的）時，此計數器將不再匹配正確的專案，因為所有下一個專案都向左移動了一個。通過在原始串列中插入零，我們將所有專案移回并再次使索引正確。

最后，我們將原始字典的計數設定為剛剛計算的值，并將唯一字典附加到最終串列中。

在此代碼之后，duplicate_array將用零填充。如果你不想要這個，你可以duplicate_array.copy()先復制串列。

uj5u.com熱心網友回復：

因此，如果您可以使用 Pandas，這可能會起作用：

import pandas as pd

results = pd.DataFrame(duplicate_array).groupby(["id", "name"]).agg("sum").reset_index().to_dict(orient="records")

缺點是您使用的是一個相當大的庫，但我認為這種方式的可讀性很好。

uj5u.com熱心網友回復：

最好將其封裝在自己的函式中：

def dedupe(dt: list) -> list:
    dx = dict() 
    for item in dt:
        key_id = (item.get('id'), item.get('name'))  # We assume that an id name is a unique identity
        current = dx.get(key_id, {
            'id': item.get('id'),
            'name': item.get('name'),
        }  # get() lets us provide a default value if it doesn't exist yet
        current['count'] = current.get('count', 0)   item.get('count', 0)  # update the current count with the count from the new item.
        dx[key_id] = current  # Update the result dictionary
    
    return [d for _, d in dx]  # Convert back to a list 

duplicate_array = [
        {'id': 1, 'name': 'john', 'count': 1},
        {'id': 1, 'name': 'john', 'count': 2},
        {'id': 2, 'name': 'peter', 'count': 1},
     ]
result = dedupe(duplicate_array)

這利用了幾個常見的 python 特性：

可散列的元組可以用作字典中的鍵。
我們可以get用來提供一個默認值，在這種情況下是一個“初始化”值。當我們第一次看到一個唯一鍵（它在我們的字典中不存在）時，我們提供這個值。然后我們從我們的重復陣列中添加計數。
因為我們使用字典來累積結果，所以我們可以利用唯一鍵對陣列進行重復資料洗掉。最后所需要做的就是將字典的值作為我們的新陣列。

請注意，key_id可以簡單地是id字典，而不是 id 和 name 的組合。這應該在O(2n)或有效地完成O(n)。您只傳遞了一次初始串列，然后傳遞了一次結果串列（如果有重復，則結果串列會更小）。如果您樂于使用字典而不是串列，則可以跳過第二遍。

另一種方法是分兩步，我們首先獲取所有唯一的 id，然后累積這些 id 的所有計數：

st = {(item.get('id'), item.get('name')) for item in duplicate_array}
ls = [{'id': id, 'name': name, 'count': sum(item.get('count') for item in duplicate_array if item.get('id') == id)} for id, name in st]

這會產生你的結果：

>>> st = {(item.get('id'), item.get('name')) for item in duplicate_array}
>>> ls = [{'id': id, 'name': name, 'count': sum(item.get('count') for item in duplicate_array if item.get('id') == id)} for id, name in st]
>>> ls
[{'id': 2, 'name': 'peter', 'count': 1}, {'id': 1, 'name': 'john', 'count': 3}]

這更緊湊，但拆包有點困難。第一遍 ( st = ...) 是創建一組元組，類似于第一個選項。第二遍是創建一個字典陣列，其中每個字典都會遍歷原始陣列，尋找應該累積到計數中的值。

我確實認為這在非??常大的集合上會更慢，因為每個新字典的創建都會ls =...傳遞整個陣列。在最壞的情況下，您沒有重復項，這意味著O(n^2). 但是，如果您正在尋找緊湊性，那就去吧。

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/487957.html

標籤：Python 字典

上一篇：Python-轉換父子字典

下一篇：Python-如何將json.loads變數轉換為串列和附加