對于某些操作來說Parallel,CPU 的數量可以很好地擴展,但對于其他操作則不然。
考慮下面的代碼,function1得到了 10 倍的改進,同時function2得到了 3 倍的改進。這是由于記憶體分配,還是 GC?
void function1(int v) {
for (int i = 0; i < 100000000; i ) {
var q = Math.Sqrt(v);
}
}
void function2(int v) {
Dictionary<int, int> dict = new Dictionary<int, int>();
for (int i = 0; i < 10000000; i ) {
dict.Add(i, v);
}
}
var sw = new System.Diagnostics.Stopwatch();
var iterations = 100;
sw.Restart();
for (int v = 0; v < iterations; v ) function1(v);
sw.Stop();
Console.WriteLine("function1 no parallel: " sw.Elapsed.TotalMilliseconds.ToString("### ##0.0ms"));
sw.Restart();
Parallel.For(0, iterations, function1);
sw.Stop();
Console.WriteLine("function1 with parallel: " sw.Elapsed.TotalMilliseconds.ToString("### ##0.0ms"));
sw.Restart();
for (int v = 0; v < iterations; v ) function2(v);
sw.Stop();
Console.WriteLine("function2 no parallel: " sw.Elapsed.TotalMilliseconds.ToString("### ##0.0ms"));
sw.Restart();
Parallel.For(0, iterations, function2);
sw.Stop();
Console.WriteLine("function2 parallel: " sw.Elapsed.TotalMilliseconds.ToString("### ##0.0ms"));
我機器上的輸出:
function1 no parallel: 2 059,4 ms
function1 with parallel: 213,7 ms
function2 no parallel: 14 192,8 ms
function2 parallel: 4 491,1 ms
環境:
Win 11,.Net 6.0,Release build
i9 第 12 代,16 核,24 proc,32 GB DDR5
在測驗更多之后,似乎記憶體分配不能很好地與多個執行緒一起擴展。例如,如果我將函式 2 更改為:
void function2(int v) {
Dictionary<int, int> dict = new Dictionary<int, int>(10000000);
}
結果是:
function2 no parallell: 124,0 ms
function2 parallell: 402,4 ms
記憶體分配不能很好地擴展多執行緒的結論嗎?...
uj5u.com熱心網友回復:
第一個 func 在暫存器中作業。更多內核 = 更多暫存器。
第二個函式適用于記憶體。更多內核 = 只有更多 L1 快取但共享 RAM。1000 萬個元素資料集當然只來自 RAM,因為即使 L3 也不夠大。這假設語言的 jit 將分配優化為重用的緩沖區。如果沒有,那么也有分配開銷。因此,您應該在每次新迭代時重新使用字典,而不是重新創建。
您還使用增量整數索引保存資料。簡單的陣列可以在這里作業,當然可以在迭代之間重復使用。它應該比字典占用更少的記憶體。
uj5u.com熱心網友回復:
tl; dr:堆分配爭用。
您的第一個功能是
(Red: 25%, Orange: 56%, Green: 75%, Blue: 100%)
With task parallelism we achieved over 20x performance using 100% of CPU threads. (in this example, not always like that)
In read-only data paralelism with some computation we achieve near 6,5x faster of CPU usage 56% (with fewer computations the difference would be shorter)
But trying to implement a "real parallism" of data for writing our performance is more than twice slower and CPU can't use full potential using only 25% usage due sycronization contexts
Conclusions: Using Parallel.For does not guarantee that your code will run really in parallel neither faster. It requires a previous code/data preparation and deep analysis, benchmarks and tunings
Check also this Microsoft Documentation talking about villains in Parallel Code
https://docs.microsoft.com/pt-br/dotnet/standard/parallel-programming/potential-pitfalls-in-data-and-task-parallelism
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/442897.html
上一篇:為什么ScheduledExecutorService沒有按預期作業?
下一篇:在Python中完成時終止執行緒
