跨CPU內核并行化代碼，該代碼迭代總700K條目的嵌套字典-有解無憂

我有以下代碼：

for key in test_large_images.keys():
    test_large_images[key]['avg_prob'] = 0
    sum = 0
    for value in test_large_images[key]['pred_probability']:
        print(test_large_images[key]['pred'])
        print(type(test_large_images[key]['pred'] ))
        if test_large_images[key]['pred'] == 1:
            sum  = value
    test_large_images[key]['avg_prob'] = sum/len(test_large_images[key]['pred_probability'])

它是一個包含 359 個大影像的字典，每個影像可以包含 200 到 8000 個較小的影像，我稱之為補丁。這test_large_images是一個關于較小影像的推理字典，每個影像塊也有預測概率、大影像名稱、影像塊名稱等。我的目標是根據該影像內較小塊預測概率的預測概率來找到較大影像的平均概率。當我在一個較小的資料集（45K 補丁）上運行這個回圈時，我已將其推斷保存在一個pkl檔案中，它運行得非常快。但是，這個腳本已經運行了 130 多分鐘，正如您在 VSCode Remote 上的遠程 Jupyter Notebook 中看到的那樣（在 Mac 上使用本地客戶端）。

有沒有辦法可以利用 24 個 CPU 內核來加速這個嵌套字典計算？

跨 CPU 內核并行化代碼，該代碼迭代總 700K 條目的嵌套字典

uj5u.com熱心網友回復：

不要sum用作變數名，因為它是內置函式。
test_large_images[key]['avg_prob'] = 0不需要這條線。
PeterK 是正確的，您的條件不需要每次都在內部 for 回圈中計算。
為什么我們要反復列印這些，或者只是為了測驗？

for key in test_large_images.keys():
    add = 0
    condition = test_large_images[key]['pred'] == 1 # This is what PeterK means by take it out (of the loop).
    for value in test_large_images[key]['pred_probability']:
        # print(test_large_images[key]['pred'])
        # print(type(test_large_images[key]['pred']))
        if condition:
            add  = value
    test_large_images[key]['avg_prob'] = add/len(test_large_images[key]['pred_probability'])

您的代碼可以簡化為：

for key in test_large_images.keys():
    condition = test_large_images[key]['pred'] == 1
    num = sum(x for x in test_large_images[key]['pred_probability'] if condition)
    denom = len(test_large_images[key]['pred_probability'])
    test_large_images[key]['avg_prob'] = num/denom

基于反饋和一些額外的優化：

for key in test_large_images.keys():
    if test_large_images[key]['pred'] != 1:
        test_large_images[key]['avg_prob'] = 0
        continue
    values = test_large_images[key]['pred_probability']
    test_large_images[key]['avg_prob'] = sum(values)/len(values)

這是兩種不同型別的平均（我最感興趣的是僅對預測為 1 的條目數取概率的平均值）。我這樣稱呼avg_prob_pos

for key in progress_bar(test_large_images.keys()):
    condition = test_large_images[key]['pred'] == 1
    num = sum(x for x in test_large_images[key]['pred_probability'] if condition)
    denom = len(test_large_images[key]['pred_probability'])
    count = sum(x for x in test_large_images[key]['pred'] if condition)
    if count != 0:
        test_large_images[key]['avg_prob_pos'] = num/count
    test_large_images[key]['avg_prob'] = num/denom
    
    percentage = test_large_images[key]['pred'].count(1)/len(test_large_images[key]['pred'])
    test_large_images[key]['percentage'] = percentage

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/473189.html

標籤：Python 字典 for循环并行处理嵌套循环

上一篇：從嵌套字典Python回傳鍵

下一篇：所有函式都在執行，但我只在python中呼叫了一個