如何在字典上使用執行緒來提高時間復雜度？-有解無憂

我是執行緒新手。我知道我們可以在函式上呼叫執行緒，但我想在字典上呼叫它。

我有一本字典，其中亂數出現在不同的索引中。我想找到所有這些數字的總和。我想要做的基本上是為該字典的每一行/索引使用一個執行緒。該單個執行緒將找到該特定行中所有數字的總和，然后將所有執行緒的這些總和加在一起以獲得最終結果。

import random
import time

li = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u"
, "v", "w", "x", "y"]

arr = {}

for k in range(0, 25):
    arr[li[k]] = [random.randrange(1, 10, 1) for i in range(1000000)]

start = time.perf_counter()

sum = 0
for k, v in arr.items():
    for value in v:
        sum  = value 

end = time.perf_counter()

print(sum)

print("Finished in: ", round(end-start, 2), " seconds")

我以前用簡單的方法來做，總共花了我大約 86 秒（由于將數字分配給字典），總共花了 5 秒來計算總和。

我想通過為字典的每個索引創建執行緒來改進這 5 秒的總和計算。誰可以幫我這個事？

uj5u.com熱心網友回復：

我知道...我們可以在函式上呼叫執行緒。

沒有。你不能在任何事情上呼叫執行緒。當你寫這個：

thread = threading.Thread(foobar, args=(x, y, z))

你沒有呼叫執行緒。您正在呼叫該類的構造函式Thread。建構式創建了一個新Thread物件，然后Thread是呼叫的物件：Thread呼叫foobar(x, y, z)。

我想要做的基本上是為該字典的每一行/索引使用一個執行緒。該單個執行緒將找到該特定行中所有數字的總和，并且...

執行緒運行代碼，您必須以函式的形式提供執行緒將運行的代碼。如果您想要一個執行緒“找到特定行中所有數字的總和..”*，那么您必須撰寫一個函式來找到所有數字的總和，然后您必須創建一個新Thread的會呼叫你的函式。

*關于您的問題的其他一些答案和評論解釋了 Python 的全域解釋器鎖（又名 GIL）如何阻止您使用執行緒來使您的程式運行得更快。所以，這個答案的其余部分是幻想，因為它不會讓你的程式更快，但它確實說明了如何創建執行緒。

您可能希望將字典和行號作為引數傳遞給函式。也許您還想向它傳遞一些可變的結果結構（例如，一個陣列），函式可以將結果保存到其中。

def FindRowSum(dictionary, row, results):
    sum = 0
    for ...:
        sum = sum   ...
    results[row] = sum

...

allThreads = []
results = []
for row in range(...):
    thread = threading.Thread(FindRowSum, args=(myDictionary, row, results))
    allThreads.append(thread)

然后，再往下看，如果你想等待所有執行緒完成它們的作業：

for thread in allThreads:
    thread.join()

uj5u.com熱心網友回復：

因此，這里有一個示例，說明如何將multiprocessing其用于“map-reduce”樣式求和問題。

這在很大程度上假設每個子問題（由表示process_key）獨立于其余子問題。

最后的歸約（將所有關鍵結果加在一起）由主程式完成。

import multiprocessing
import os
import string
import time
from typing import Tuple, List


def get_key_data(key: str) -> List[int]:
    # Get data for a given key from a database or wherever;
    # here we just get a big blob of random bytes.
    return list(os.urandom(1_000_000))


def process_key(key: str) -> Tuple[str, int]:
    # This function is run in a separate process,
    # so it can't access global data in the same way a function
    # in the same process could.  Program accordingly.
    key_data = get_key_data(key)
    result_for_key = sum(key_data)  # Could be heavier computation here...

    # Returning a tuple makes it easier to work with the keyed data in the main program.
    return (key, result_for_key)


def main():
    start = time.perf_counter()
    keys = list(string.ascii_lowercase)
    with multiprocessing.Pool() as p:
        results = {}
        # Since result order doesn't matter, we can use `imap_unordered` to optimize performance.
        # It would also be worth adding `chunksize=...` to spend less time in serializers.
        for key, result in p.imap_unordered(process_key, keys):  # unpacking result tuples here
            print(f"Got result {result} for key {key}")
            results[key] = result
    grand_total = sum(results.values())
    end = time.perf_counter()

    print(f"Grand total: {grand_total} in {end - start:.2f} seconds")


if __name__ == '__main__':
    main()

這列印出來（類似）

Got result 127439637 for key y
Got result 127521766 for key z
Got result 127410016 for key a
Got result 127618358 for key b
Got result 127510624 for key c
Got result 127525228 for key d
Got result 127471359 for key e
Got result 127535553 for key f
Got result 127457231 for key m
Got result 127547738 for key n
Got result 127567059 for key o
Got result 127470823 for key g
Got result 127465435 for key h
Got result 127497010 for key i
Got result 127432593 for key j
Got result 127555330 for key k
Got result 127402226 for key l
Got result 127534939 for key p
Got result 127558057 for key q
Got result 127474231 for key r
Got result 127491137 for key v
Got result 127520358 for key w
Got result 127490582 for key x
Got result 127489005 for key s
Got result 127485159 for key t
Got result 127503702 for key u
Grand total: 3314975156 in 0.60 seconds

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/433433.html

標籤：Python 多线程字典

上一篇：Java中用信號量獲取、釋放、再獲取多個permit時是否會出現死鎖？

下一篇：列上的Pythongroupby()和agg()方法混淆