I’ve executed a program that works with arrays (dimension 1).
When I want to execute a task I only call to cuLaunchKernel() as follow:
Normally I need about 500 threads
cuLaunchKernel(function, 1, 1, 1, numThreads, 1, 1, 0, null, args, null);
But I wondered whether, assuming that I need 1 million of threads, It’s good to do the same?
Or could I thinking that the performance can be better using the “gridDimX” argument? If it’s so, I don’t have an clear idea that how to do it.
I hope you understand.