CUDA_ERROR_INVALID_HANDLE on cuLaunchKernel()

Looked through the threds but did not find solution.
So, when I try to invoke cuLaunchKernel() JCuda throw " jcuda.CudaException: CUDA_ERROR_INVALID_HANDLE".

Here is initialization, (I have taken this from JCuda tutorial) :


            // Create the PTX file by calling the NVCC
            String ptxFileName = preparePtxFile("./cudacore/JCudaMatrixHandler.cu");

            // Initialize the driver and create a context for the first device.
            cuInit(0);
            CUdevice device = new CUdevice();

            cuDeviceGet(device, 0);
            CUcontext context = new CUcontext();
            cuCtxCreate(context, 0, device);

            // Load the ptx file.
            CUmodule module = new CUmodule();
            cuModuleLoad(module, ptxFileName);

            // Obtain a function pointer to the "add" function.
            cuFunction = new CUfunction();
            cuModuleGetFunction(cuFunction, module, "handler");```

Then, I have this code further:
``` cuInit(0);
        CUdevice device = new CUdevice();

        cuDeviceGet(device, 0);
        CUcontext context = new CUcontext();
        cuCtxCreate(context, 0, device);

        CUdeviceptr devicePixels = new CUdeviceptr();
        cuMemAlloc(devicePixels, width*height*Sizeof.INT);
        cuMemcpyHtoD(devicePixels, Pointer.to(pixels), width*height*Sizeof.INT);

        CUdeviceptr devicePixelStub = new CUdeviceptr();
        cuMemAlloc(devicePixelStub, width*height*Sizeof.INT);
        cuMemcpyHtoD(devicePixelStub,Pointer.to(pixelsStub), width*height*Sizeof.INT);

        Pointer kernelParameters = Pointer.to(
                Pointer.to(devicePixels),
                Pointer.to(devicePixelStub),
                Pointer.to(new int[]{width}),
                Pointer.to(new int[]{height}),
                Pointer.to(new int[]{echelon})
        );
        //Here JCude throw the exception, tried change variables "height" and "width" but it does not affect on this problem
        cuLaunchKernel(cuFunction,
                height, 1, 1,
                width, 1,1,
                0, null,
                kernelParameters, null);
        cuCtxSynchronize();```

Here is my .cu file, it was bigger but I cut everything except cuda entry point:
```extern "C"
__global__ void handle(int *sourcePixels, int *newPixels, int width, int height, int echelon){}```

So, maybe somebody can help me with this?

Hello

First I thought this might be related to the name difference (handle vs. handler), but you seem to have it correct locally, because otherwise it would throw an exception earlier.

I found out that seems to be related to the fact that you are really doing the initialization twice. The part

        cuInit(0);
        CUdevice device = new CUdevice();
        cuDeviceGet(device, 0);
        CUcontext context = new CUcontext();
        cuCtxCreate(context, 0, device);

appears twice in your code.

Note that the CUcontext basically “summarizes everything” of CUDA: The functions, modules, device memory etc. always belong to ONE context. (And you have to juggle with multiple contexts only when you are doing some sophisticated multithreaded multi-GPU stuff).

You are obtaining the CUfunction with the first context. Later, you are creating a new context, and try to call the function that was obtained previously - but this CUfunction is invalid in the new context.

This should be solvable easily, namely by making sure that cuInit/cuCtxCreate are only called once

bye
Marco