I am using the JCUDA to do image processing acceleration
The parallel algorithm is programmed as a cuda kernel in C. it works for the image in dimension of 816612 and 16321224. However, when I change the target image into one with dimension of 408*306, the kernel doesn’t work, and the error message from JCUDA is as below:
Exception in thread “main” jcuda.CudaException: CUDA_ERROR_UNKNOWN
at jcuda.driver.JCudaDriver.checkResult(JCudaDriver.java:282)
at jcuda.driver.JCudaDriver.cuCtxSynchronize(JCudaDriver.java:1795)
The kernel calling code is as below:
Pointer kernelParameters = Pointer.to(
Pointer.to(deviceInputsdata),
Pointer.to(deviceInputslist),
Pointer.to(deviceInputneighborshift),
Pointer.to(deviceInputbuckets),
Pointer.to(deviceInputweightMap),
Pointer.to(deviceInputnBuck),
Pointer.to(new int[]{numElements}),
Pointer.to(new int[]{bucketer.NeighborNum}),
Pointer.to(new float[]{sigmaS}),
Pointer.to(new float[]{sigmaR}),
Pointer.to(new float[]{smin}),
Pointer.to(deviceOuputmsRawData),
Pointer.to(deviceInputmodeTable),
Pointer.to(new int[]{Width})
);
int blockSizeX = 32;
int gridSizeX = (int)Math.ceil((double)numElements / blockSizeX);
cuLaunchKernel(function,
gridSizeX, 1, 1, // Grid dimension
blockSizeX, 1, 1, // Block dimension
0, null, // Shared memory size and stream
kernelParameters, null // Kernel- and extra parameters
);
cuCtxSynchronize();
I have tried different block size such as 32, 64 and 128. None of them works for me.
I don’t understand why this kernel works for image of bigger dimension while failed in the case of the smaller one.
Please help me! Thanks