Python：如何有效地將4個字典串列嵌套為一個？-有解無憂

我有一個MSSQL的存盤程序回傳4個選項對我說：Entities，Certificates，Contacts和Logs。我需要在 Pyton 中結合這 4 個選擇，我把所有的都放在那里Entities，Contacts并Logs在它們的Certificate. 這些選擇中的每一個都有一個EntityIdI 可用于合并。

輸入是包含來自 SQL 的資訊的簡單、基本資料類的串列。我們在合并函式中將這些資料類轉換為字典。

當我最初撰寫代碼時，我不知道選擇可能非常大（100.000s 以及Certificates所有其他記錄）。不幸的是，由于回圈內串列推導式的許多不必要的迭代，這使得下面的代碼非常低效。最多可能需要 70 秒。我相信有一種方法可以使這個速度更快。如何提高性能以盡可能高效？

from dataclasses import asdict

def cert_and_details(entities: List[Entity], 
                    certificates: List[Certificate], 
                    req_logs: List[DocumentRequestHistory], 
                    recipients: List[Recipient]) -> List[dict]:

    entities = [asdict(ent) for ent in entities] 
    certificates = [asdict(cert) for cert in certificates]
    req_logs = [asdict(log) for log in req_logs]
    recipients = [asdict(rec) for rec in recipients]

    results = []
    for cert_dict in certificates:

        cert_entity_id = cert_dict["entityid"]

        logs_under_cert = [log for log in req_logs if log["entityid"] == cert_entity_id]
        cert_dict["logs"] = logs_under_cert

        entities_under_cert = [ent for ent in entities if ent["entityid"] == cert_entity_id]
        cert_dict["linkedentity"] = entities_under_cert

        recipients_under_cert = [rec for rec in recipients if rec["entityid"] == cert_entity_id]
        cert_dict["recipients"] = recipients_under_cert

        results.append(cert_dict)

    return results

uj5u.com熱心網友回復：

所提供代碼的主要問題是它的計算復雜性：它運行的O(C * (L E R))位置C是證書L 數量、日志E數量、物體R數量和接收者數量。如果L E R很小，這很好，但如果不是這種情況，那么代碼會很慢。

您可以撰寫一個O(C L E R)及時運行的實作。這個想法是首先建立一個索引，按物體 ID對日志/物體/收件人進行分組。這是一個簡短的例子：

# Note: defaultdict should help to make this code smaller (and possibly faster)
logIndex = dict()
for log in req_logs:
    entityId = log["entityid"]
    if entityId in logIndex:
        logIndex[entityId].append(log)
    else:
        logIndex[entityId] = [log]

此代碼在 (已攤銷) 中運行O(L)。然后，您可以僅使用檢索req_log具有給定物體 ID 的所有專案logIndex[entityId]。

提供的代碼中還有另一個問題：字典串列效率低下：字典索引很慢，字典也沒有記憶體效率。存盤和計算資料的更好方法可能是使用資料幀（例如，使用Pandas也提供相對優化的groupby功能）。

uj5u.com熱心網友回復：

下面也可能是另一種制作復雜度順序的方法（2*C L E R）。

警告：我沒有試過運行它，它只是模擬代碼，并沒有盡可能提高效率。我也只是模擬它在概念上思考如何使它線性復雜，它可能有一些我錯過的基本“Ooops”。

但它基于回圈遍歷每個 C、L、E 和 R 一次的概念。這是通過首先制作certificates字典而不是串列來完成的。關鍵是它entityid。用于存盤每個證書日志、物體和收件人的串列也是在那時創建的。

然后你可以只回圈一次 L、E 和 R，并通過查找 entityid直接將它們的條目添加到證書字典中。

最后一步（所以為什么 2*C 在復雜性中）是回圈遍歷證書字典并將其轉換為串列以匹配所需的輸出型別。

from dataclasses import asdict

def cert_and_details(entities: List[Entity], 
                    certificates: List[Certificate], 
                    req_logs: List[DocumentRequestHistory], 
                    recipients: List[Recipient]) -> List[dict]:

    certs = {}

    for cert in certificates:
        cert_dict = asdict(cert)
        cert_id = cert_dict['entityid']

        certs[cert_id] = cert_dict
        certs['logs'] = []
        certs['recipients'] = []
        certs['linkedentity'] = []

    for log in logs:
        log_dict = asdict(log)
        log_id = log_dict['entityid']
        certs[log_id]['logs'].append(log_dict)


    for ent in entities:
        ent_dict = asdict(ent)
        ent_id = ent_dict['entityid']
        certs[ent_id]['linkedentity'].append(ent_dict)

    for rec in recipients:
        rec_dict = asdict(rec)
        rec_id = rec_dict['entityid']
        certs[rec_id]['recipients'].append(rec_dict)

    # turn certs back into list, not dictionary
    certs = [cert for cert in certs.values()]

    return certs

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/359680.html

標籤：Python 表现休息记录 python-数据类

上一篇：如何在ReactJS（函陣列件）中將資料從API渲染到表格

下一篇：在PowerShell中使用GET并將JSON匯出為CSV