I’m having a problem with JCufft when performing a 3D Complex to real transformation using a large amount of data.
the code works fine for data of 256 x 256 x 256 but fails with “CUFFT_Exec_Failed” error when using data of 512 x 512 x 512.
here is the code:
Pointer imgp = new Pointer();
Pointer outp = new Pointer();
Pointer img2 = new Pointer();
int sizeFloat = x * y * z;
int sizeComplex = x * y * ((z/2) + 1);
float[] img = imf.brakeDownToFloat(ims).getArray(); // transforme a 3D img to a suitable 1D
float[] imgr = new float[x*y*z];
int r = cudaMalloc(imgp,sizeFloat * Sizeof.FLOAT);
IJ.log("alloc img: " + cufftResult.stringFor(r));
r = cudaMalloc(outp, 2 * sizeComplex * Sizeof.FLOAT);
r = cudaMalloc(img2, sizeFloat * Sizeof.FLOAT);
r = cudaMemcpy(imgp, Pointer.to(img),(sizeFloat * Sizeof.FLOAT) , cudaMemcpyHostToDevice);
cufftHandle plan = new cufftHandle();
r = JCufft.cufftPlan3d(plan,x, y, z, cufftType.CUFFT_R2C);
r = JCufft.cufftExecR2C(plan, imgp, outp);
r = JCufft.cufftPlan3d(plan, x, y, z, cufftType.CUFFT_C2R);
r = JCufft.cufftExecC2R(plan, outp, img2);
IJ.log("backward : " + cufftResult.stringFor(r)); // ERROR CUFFT_EXEC_FAILED (only on large data)
r = cudaMemcpy(Pointer.to(imgr),img2,(sizeFloat * Sizeof.FLOAT),cudaMemcpyDeviceToHost);
IJ.log("copy back: " + JCuda.cudaGetErrorString(r));
...
thank you for your help. PS i use Tesla GPU and CUDA 4.0
Hello
CUFFT has a size limit for the transforms. The CUFFT documentation says…
This version of the CUFFT library supports the following features:
…
- Transform sizes up to 64 million elements in single precision and up to 128 million elements in double precision in any dimension, limited by the available GPU memory
And 512512512 exceeds this limit. Depending on wheter „elements“ in this case means „float elements“ or really the size of the transform, a size of 512512256 or 512256256 should work.
bye
Marco
thank you Marco13 for your reply
but i did not quite understood why the forward FFT (R2C) passes for the same size (512x512x512) but not the (C2R).
Admittedly, I’m also not sure. I assume that the “number of elements” that are supported according to the documentation really refers to the specified size of the transform (and not to the number of ‘float elements’). So the size you are working with is close to the limits which are officially supported, and maybe larger than the limits - and the fact that seems to be working in one direction might be a an “implementation detail”.
In any case, I assume that this is not an issue of JCufft, but the behavior is only determined by CUFFT itself. If there are any doubts, I can try to confirm this by trying the same program in plain CUFFT. This could also help to make sure that it is not solely a memory issue: You are creating and executing a plan, and then re-using the same handle to create and execute a new one:
r = JCufft.cufftPlan3d(plan,x, y, z, cufftType.CUFFT_R2C);
r = JCufft.cufftExecR2C(plan, imgp, outp);
r = JCufft.cufftDestroy(plan); // Don't forget to destroy the plan!
r = JCufft.cufftPlan3d(plan, x, y, z, cufftType.CUFFT_C2R);
IJ.log("Result of creating the plan : " + cufftResult.stringFor(r));
r = JCufft.cufftExecC2R(plan, outp, img2);
IJ.log("backward : " + cufftResult.stringFor(r));
Destroying the plan might allow CUFFT to free some memory that has been allocated internally, when the plan was created. But if this also does not help, I’m afraid that I can not give a more profound answer to this at the moment. Maybe the people at the NVIDIA forum have more information about this. I checked the documentation and the release notes again, but did not find any further hints…