New Helper Classes

It are helper classes to easily allocate and set or copy data to the device and back.
It is just really helpfull and it was the first thing i programmed before starting my ba-thesis.

They are far from perfect and finished but I dont have time to work on it anymore and so I release their source files.

http://www.release-search.com/other_stuff/programming/cuda/utilCuda_1.0/utilCuda_1.0.rar

JavaDoc is in the rar file or currently at:

javadoc

Example:

It is pretty straightforward to use but here is an example about normalizing column vectors of a 2d matrix.

Kernel:

/**
 * ///////////////
 * // Arguments //
 * ///////////////
 *
 * @param inOutMat_g float** Matrix
 * @param inWidth_s int width of the Matrix 
 * @param inHeigth_s int height of the Matrix
 * @param inTileCount_s int Column Tile Count equals
 *	heigth / (blockDim.x * 2) since one thread loads 2 values
 * @inOutputTileCount_s int Output column tile count equals to
 *	heigth / blockDim.x
 */
__global__ void normColumn(float** inOutMat_g,
	const unsigned int inWidth_s,
	const unsigned int inHeigth_s,
	const unsigned int inTileCount_s,
	const unsigned int inOutputTileCount_s)
{

	const unsigned int blockId = blockIdx.y * gridDim.x + blockIdx.x //2D
		+ gridDim.x * gridDim.y * blockIdx.z; //3D

	if (blockId >= inWidth_s)
		return;
	
	... normalize rows
}

Java Kernel call:


//our input float matrix
float[][] mat = ...;

//create new host and device pointers and allocate and copy mat to the device
CudaFloat2D mat_hd = new CudaFloat2D(mat);

//call synchronous kernel on device
UtilCuda.kernelLauncherCreateSetupCall("kernels/util/GPU_norm_kernel.cu", 
	 "normColumn",
	CudaDevice.getNewGridDim3(width), //grid dimension --> calls device specific UtilCuda.getNewGridDim(...)
	UtilCuda.getNewDim3(CudaDevice.BLOCK_SIZE_POW2), //block dimension
	new Object[] { mat_hd,  //argument list
		width, 
		heigth, 
		Util.getCeil(heigth, CudaDevice.BLOCK_SIZE_POW2_DOUBLE),
		Util.getCeil(heigth, CudaDevice.BLOCK_SIZE_POW2)});

//copy results back and free device pointer
float[][] matRes = mat_hd.getResults(true);

Thanks for these, I assume that these might be interesting for people who intend to write their own utility methods for JCuda. I think that most people do this, because writing something like

int numElements = ..;
float hostInput[] = createInput(numElements);
CUdeviceptr deviceInput = new CUdeviceptr();
cuMemAlloc(deviceInput, numElements * Sizeof.FLOAT);
cuMemcpyHtoD(deviceInput, Pointer.to(hostInput),
    numElements * Sizeof.FLOAT);

is tedious (and it’s orders of magnitude worse when dealing with 2D arrays!) and of course, this would not be necessary in Java. The same thing should be doable with something like

DeviceBuffer input = new DeviceBuffer(createInput(...));

I already mentioned elsewhere: Originally, I intended to write an Object-Oriented Abstraction layer for JCuda - „nobody“ wants to use a low-level API like CUDA :wink: Some parts are obvious, like the basic memory handling, and maybe parts of the device management and kernel execution. (In fact: The „KernelLauncher“ class ist basically what remained from my first attempts to write a „Kernel“ class :wink: ). But other parts may be challenging, and properly designing such an API requires a considerable amount of effort, and knowledge about the application cases that I did not have back when I started with that, and which I still may lack now. However, of course, the idea is not dead. Maybe one day I’ll find the time…

java is not the best to do so because you cannot use generics with primitive datatypes, if you could it would be very simple but this way for each primitive datatype i would need 1D, 2D and 3D classes.