Multiple GPUs using JCUDA and streams

Hello again,

I am working on the same algorithm described in my previous thread:

Now I am incorporating my workflow system into the algorithm. The workflow system is capable of supporting multiple GPUs, asynchronous memory copies that are overlapped with CPU and GPU computation, and splitting computational components on either CPU or GPU.

The issue I am having now is I am getting a CUDA_ERROR_INVALID_HANDLE when launching a custom kernel.

My question is when creating a context for single or multiple GPUs do I need to have a separate CUfunction for each context in order to execute the functions?


I think you will have to create multiple context, load the module for each context, and obtain the function for each module (that is, for each context). The documentation of cuModuleLoad explicitly states that it “loads the corresponding module into the current context”.

My possibilities to run tests with multiple GPUs are rather limited, but if you encounter problems, I can try so create an example (this could be useful as a sample, anyhow)