在一個執行緒中運行代碼比在主執行緒中運行代碼慢-有解無憂

我正在測驗在一個執行緒中運行雙重計算，我得到了這個奇怪的結果。在主執行緒中運行計算幾乎比在單獨的執行緒中運行并在主執行緒中呼叫 join 花費一半的時間。如果它是單執行緒，則與僅運行該函式應該沒有太大區別。難道我做錯了什么？

CPU 是 Intel Xeon E-2136，頻率限制在 4.1GHz，與運行的內核數量無關，具有相同的提升頻率。

#include <cstdio>
#include <stdexcept>
#include <thread>
#include <future>
#include <malloc.h>
#include <time.h>

#define TEST_ITERATIONS 1000*1000*1000

void *testNN(void *dummy) {
  volatile double x;
  for (int i = 0; i < TEST_ITERATIONS;   i) {
    x = rand();
    x *= rand();
  }
  return nullptr;
}

int main(){
    time_t start = time(nullptr);

    { // for future to join thread

      testNN(nullptr); // 12s

//      pthread_t thread_id;
//      pthread_create(&thread_id, NULL, testNN, nullptr);
//      pthread_join(thread_id, NULL); //27s

      std::future<void *> f[12];
//      f[0] = std::async(std::launch::async, testNN, nullptr);   // 27s
      // for multithreaded testing:
//    f[1] = std::async(std::launch::async, testNN, nullptr);
//    f[2] = std::async(std::launch::async, testNN, nullptr);
//    f[3] = std::async(std::launch::async, testNN, nullptr);
//    f[4] = std::async(std::launch::async, testNN, nullptr);
//    f[5] = std::async(std::launch::async, testNN, nullptr);
//    f[6] = std::async(std::launch::async, testNN, nullptr);
//    f[7] = std::async(std::launch::async, testNN, nullptr);
//    f[8] = std::async(std::launch::async, testNN, nullptr);
//    f[9] = std::async(std::launch::async, testNN, nullptr);
//    f[10] = std::async(std::launch::async, testNN, nullptr);
//    f[11] = std::async(std::launch::async, testNN, nullptr);

    }

    time_t runTime = time(nullptr);
    runTime -= start;

    printf("calc done in %lds (%ld calc/s)\n", runTime, TEST_ITERATIONS / runTime);

}

我編譯

# g   -std=c  11 test.cpp  -o test -lpthread

以及函式呼叫、pthread 和 std::async 的結果：

# time ./test
calc done in 12s (83333333 calc/s)

real    0m12.073s
user    0m12.070s
sys     0m0.003s

# time ./test
calc done in 27s (37037037 calc/s)

real    0m26.741s
user    0m26.738s
sys     0m0.004s

# time ./test
calc done in 27s (37037037 calc/s)

real    0m26.788s
user    0m26.785s
sys     0m0.003s

PS 我仍然不確定是否要使用 C 11。我使用 C 11 只是為了測驗普通 phread 和 std::async 之間是否存在差異。

更新：這是一個更容易測驗的版本，出錯的機會更少：

https://pastecode.io/s/ov4ifgy5

uj5u.com熱心網友回復：

感謝@AndreasWenzel，我發現 rand() 導致速度變慢。理論上，當只有一個執行緒在運行時（或者至少沒有其他執行緒正在呼叫 rand），這應該不是問題。用 rand_r() 替換 rand() 可以解決問題，甚至可以將相同作業量的時間縮短到 8 秒。這是測驗功能：

void *testNN(void *dummy) {
  volatile double x;
  unsigned int seed = (unsigned int) time(nullptr);


  for (long i = 0; i < TEST_ITERATIONS;   i) {
    x = rand_r(&seed);
    x *= rand_r(&seed);
  }
  return nullptr;
}

我知道像這樣播種并不理想- 啟動 12 個執行緒很可能會為所有執行緒播種相同的編號，但這只是一個測驗。我很可能會有更復雜的種子功能。

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/521005.html

標籤：C 多线程c 11线程

上一篇：如何洗掉結構C 中結構中的陣列

下一篇：初始化在堆上創建的物件陣列