我一直在關注,并在C / C修改為OpenMP的教程 演示/了解schedule()作品#pragma omp parallel for。這是我的代碼:
#include <unistd.h>
#include <stdlib.h>
#include <omp.h>
#include <stdio.h>
#define THREADS 4
#define N 100
int main ( ) {
int i;
int perThread=0;
printf("Running %d iterations on %d threads.\n", N, THREADS);
#pragma omp parallel for num_threads(THREADS) private(perThread) //schedule(static)
for (i = 0; i < N; i ) {
perThread ;
printf("Thread: %d\t loops: %d\n", omp_get_thread_num(), perThread);
usleep(10000); // to slow the process down a bit
//Uncomment below to simulate one thread taking longer on each loop
// if(omp_get_thread_num()==1)
// sleep(1);
}
// all threads done
printf("All done!\n");
return 0;
}
我將它保存為“schedule_example.cpp”并編譯它:
g schedule_example.cpp -fopenmp -o SheduleEx
然后我將它與未schedule(static)注釋的第 13 行進行比較,并再次與各種選項進行比較schedule(),即schedule(static,25) schedule(static,5) schedule(dynamic) schedule(dynamic,5) schedule(runtime)
調度程式作業,代碼演示了差異(特別是當第 20 和 21 行未注釋時。)
問題是,對于某些但不是所有執行緒schedule()的起始值的某些但不是所有選項perThread都會更改,這可以在列印輸出中看到。
我已經在幾臺不同的機器上運行了代碼,它們都顯示了相似的結果。我在 Windows 10 筆記本電腦上使用了 WSL,g --version回傳:g (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0。
以最后 8 行為例。如果schedule()被注釋掉或使用schedule(static)輸出是正確的:
Thread: 2 loops: 24
Thread: 0 loops: 24
Thread: 1 loops: 24
Thread: 3 loops: 24
Thread: 2 loops: 25
Thread: 1 loops: 25
Thread: 0 loops: 25
Thread: 3 loops: 25
All done!
但是,如果我使用其他任何東西,shedule()即使schedule(static,25)它應該給出相同的結果,它也會編譯并運行,但最后幾行輸出是:
Thread: 1 loops: 24
Thread: 3 loops: 24
Thread: 0 loops: 22010
Thread: 2 loops: 24
Thread: 1 loops: 25
Thread: 3 loops: 25
Thread: 0 loops: 22011
Thread: 2 loops: 25
All done!
問題是 的起始值perThread已設定為 1986 但僅適用于執行緒 0。
如果我在不重新編譯的情況下重新運行它,我會得到類似的結果,總是執行緒 0 是錯誤的,并且每次大約 22000,但數量不一樣。如果我在重新運行之前重新編譯它會給出相同的結果。
I then ran the same code on Raspberry Pi and got similar but slightly different results.
g --version returns:
g (Raspbian 10.2.1-6 rpi1) 10.2.1 20210110
The Raspberry Pi only prints out the correct value for loops on all threads if schedule(dynamic) or schedule(dynamic, X) is used - I tried 1, 5, and 25 as values for X.
If (static) or (static, X) is used then all the threads except thread 0 have a starting value of around 67321, this number is always the same for thread 1, 2, and 3, and is often but not always the same between successive runs of the code.
(auto) behaves the same as (static).
However, (runtime) is opposite to (static), only thread 0 that is wrong, but is also about 67481 off - however when when running it a few times in a row it was the same amount wrong each time.
我在另一臺裝有 Arch Linux 的 PC 上再次運行相同的代碼,得到了與 Windows 10 筆記本電腦相似的結果。
就一個實際問題而言,我在撰寫代碼的方式上是否做錯了什么?有沒有辦法確保執行緒的變數不被改變?
抱歉,這篇文章太長了,但我認為問題的核心是,schedule()在private()某些時候,它會以某種方式影響parallel for回圈開始時某些執行緒的變數。
謝謝
uj5u.com熱心網友回復:
您應該使用firstprivate(perThread)而不是private(perThread). 使用private子句宣告了您的私有變數,但未初始化,因此其值未定義。
在OpenMP 規范中,您可以閱讀
該
firstprivate子句將一個或多個串列項宣告為任務私有,并使用遇到構造時對應的原始項所具有的值來初始化它們中的每一個。
所以你必須使用這個子句。
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/409601.html
標籤:
