I am working on the same algorithm described in my previous thread: http://forum.byte-welt.net/threads/5115-Using-JCudaDriver-with-JCufft
Now I am incorporating my workflow system into the algorithm. The workflow system is capable of supporting multiple GPUs, asynchronous memory copies that are overlapped with CPU and GPU computation, and splitting computational components on either CPU or GPU.
The issue I am having now is I am getting a CUDA_ERROR_INVALID_HANDLE when launching a custom kernel.
My question is when creating a context for single or multiple GPUs do I need to have a separate CUfunction for each context in order to execute the functions?