HI,請問在做GPU編程時,要根據其特性進行寫KERNEL,如何搞清楚資料劃分或者是任務并行達性能接近最優?
舉例:
const int ARRAY_SIZE = 1000;
size_t globalWorkSize[1] = { ARRAY_SIZE };
size_t localWorkSize[1] = { 1 };
// Queue the kernel up for execution across the array
errNum = clEnqueueNDRangeKernel(commandQueue, kernel, 1, NULL,
globalWorkSize, localWorkSize,
0, NULL, NULL);
if (errNum != CL_SUCCESS)
{
std::cerr << "Error queuing kernel for execution." << std::endl;
Cleanup(context, commandQueue, program, kernel, memObjects);
return 1;
}
size_t globalWorkSize[1] = { ARRAY_SIZE };
size_t localWorkSize[1] = { 1 };
如何設定才能達到最優?
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/52714.html
標籤:OpenCL和異構編程
上一篇:爬蟲開發,獲取瀏覽器資料
下一篇:python構造字典問題
