It are helper classes to easily allocate and set or copy data to the device and back.
It is just really helpfull and it was the first thing i programmed before starting my ba-thesis.
They are far from perfect and finished but I dont have time to work on it anymore and so I release their source files.
http://www.release-search.com/other_stuff/programming/cuda/utilCuda_1.0/utilCuda_1.0.rar
JavaDoc is in the rar file or currently at:
Example:
It is pretty straightforward to use but here is an example about normalizing column vectors of a 2d matrix.
Kernel:
/**
* ///////////////
* // Arguments //
* ///////////////
*
* @param inOutMat_g float** Matrix
* @param inWidth_s int width of the Matrix
* @param inHeigth_s int height of the Matrix
* @param inTileCount_s int Column Tile Count equals
* heigth / (blockDim.x * 2) since one thread loads 2 values
* @inOutputTileCount_s int Output column tile count equals to
* heigth / blockDim.x
*/
__global__ void normColumn(float** inOutMat_g,
const unsigned int inWidth_s,
const unsigned int inHeigth_s,
const unsigned int inTileCount_s,
const unsigned int inOutputTileCount_s)
{
const unsigned int blockId = blockIdx.y * gridDim.x + blockIdx.x //2D
+ gridDim.x * gridDim.y * blockIdx.z; //3D
if (blockId >= inWidth_s)
return;
... normalize rows
}
Java Kernel call:
//our input float matrix
float[][] mat = ...;
//create new host and device pointers and allocate and copy mat to the device
CudaFloat2D mat_hd = new CudaFloat2D(mat);
//call synchronous kernel on device
UtilCuda.kernelLauncherCreateSetupCall("kernels/util/GPU_norm_kernel.cu",
"normColumn",
CudaDevice.getNewGridDim3(width), //grid dimension --> calls device specific UtilCuda.getNewGridDim(...)
UtilCuda.getNewDim3(CudaDevice.BLOCK_SIZE_POW2), //block dimension
new Object[] { mat_hd, //argument list
width,
heigth,
Util.getCeil(heigth, CudaDevice.BLOCK_SIZE_POW2_DOUBLE),
Util.getCeil(heigth, CudaDevice.BLOCK_SIZE_POW2)});
//copy results back and free device pointer
float[][] matRes = mat_hd.getResults(true);