[JCuda] Utility class for launching kernels

Hello

I have created a small utilitiy class which simplifies the creation and launching of CUDA kernels using the JCuda driver API. It allows creating CUBIN (CUDA binary) files at runtime. Either from a String containing the source code, or from an existing CUDA source file. Additionally, existing CUBIN files may be loaded to access the kernel functions. The class also offers an easier and more convenient way to call the kernels, which resembles the semantics of the runtime calls like
kernel<<<gridSize,blockSize,sharedMemSize,stream>>>(…);

The „KernelLauncher“ class and a small sample that shows how it might be used is available at the JCuda samples website.

As an example: Consider a function performing a vector addition, contained in a single CUDA source file. Using the utility class, the function may be called with

KernelLauncher kernelLauncher = 
    KernelLauncher.create("add.cu", "add");
kernelLauncher.setBlockSize(size, 1, 1);
kernelLauncher.call(dResult, dA, dB);

In contrast, the usual way to call a the kernel function (assuming that the CUBIN file had already been created manually using the NVCC) would be

// Initialize the driver and create a context for the first device.
cuInit(0);
CUcontext pctx = new CUcontext();
CUdevice dev = new CUdevice();
cuDeviceGet(dev, 0);
cuCtxCreate(pctx, 0, dev);

// Load the CUBIN file and obtain the function.
CUmodule module = new CUmodule();
cuModuleLoad(module, "add.cubin");
CUfunction function = new CUfunction();
cuModuleGetFunction(function, module, "add");

// Set up input data
...

// Set up the execution parameters for the kernel
Pointer pdA = Pointer.to(a);
Pointer pdB = Pointer.to(b);
Pointer pdResult = Pointer.to(result);
int offset = 0;
offset = align(offset, Sizeof.POINTER);
cuParamSetv(function, offset, pdA, Sizeof.POINTER);
offset += Sizeof.POINTER;
offset = align(offset, Sizeof.POINTER);
cuParamSetv(function, offset, pdB, Sizeof.POINTER);
offset += Sizeof.POINTER;
offset = align(offset, Sizeof.POINTER);
cuParamSetv(function, offset, pdResult, Sizeof.POINTER);
offset += Sizeof.POINTER;
cuParamSetSize(function, offset);

// Call the kernel function.
cuFuncSetBlockShape(function, size, 1, 1);
cuLaunchGrid(function,1,1);
cuCtxSynchronize();

Any feedback is welcome :slight_smile: