Hello
The question is partially specific for JCuda in so far that it involves a Java 2D array: data[][]. The main problem with these arrays is that they are not necessarily stored as a continguous block in memory.
Despite the strong relationship between pointers and arrays in C, the same problem may occur there as well: When creating a “2D array” in C like this
float array[3][3];
it can be considered as being roughly equivalent to a float[9] array (although at the moment I’m not sure if the C specification really asserts that it will be a continguous memory block). But it is also possible to create an “2D array” as an array of pointers
float **array = (float**)malloc(3*sizeof(float*));
for (int i=0; i<3; i++)
{
array** = (float*)malloc(3*sizeof(float));
}
This may also be seen as a “2D array” and accessed like the first one…
array**[j] = 123.456f;
This closer resembles the semantics of a “2D array” in Java. But such an array can not be copied from the host to the device using the usual CUDA functions (not even with the Memcpy2D functions), because the array does not store 9 float values, but 3 pointers to floats.
The CUDA functions require the array to be stored as a continguous block. So the only ways to copy a “2D array” from Java to CUDA is to store it as an 1D array, or alternatively, of course, to copy each row separately, as in your first code block.
When stored as a 1D array, the memcpy2D functions and structures may be used as in this example:
import jcuda.*;
import jcuda.driver.*;
public class JCudaDriverArrayTest
{
public static void main(String args[])
{
// Initialize the driver and create a context for the first device.
JCudaDriver.cuInit(0);
CUcontext pctx = new CUcontext();
CUdevice dev = new CUdevice();
JCudaDriver.cuDeviceGet(dev, 0);
JCudaDriver.cuCtxCreate(pctx, 0, dev);
// Prepare the input and output arrays on the host
int width = 3;
int height = 3;
float input[] = new float[width*height];
for (int i=0; i<width*height; i++)
{
input** = i;
}
float output[] = new float[width*height];
// Create the 2D array on the device
CUarray array = new CUarray();
CUDA_ARRAY_DESCRIPTOR ad = new CUDA_ARRAY_DESCRIPTOR();
ad.Format = CUarray_format.CU_AD_FORMAT_FLOAT;
ad.Width = width;
ad.Height = height;
ad.NumChannels = 1;
JCudaDriver.cuArrayCreate(array, ad);
// Copy the host input to the 2D array
CUDA_MEMCPY2D copyHD = new CUDA_MEMCPY2D();
copyHD.srcMemoryType = CUmemorytype.CU_MEMORYTYPE_HOST;
copyHD.srcHost = Pointer.to(input);
copyHD.srcPitch = width * Sizeof.FLOAT;
copyHD.dstMemoryType = CUmemorytype.CU_MEMORYTYPE_ARRAY;
copyHD.dstArray = array;
copyHD.WidthInBytes = width * Sizeof.FLOAT;
copyHD.Height = height;
JCudaDriver.cuMemcpy2D(copyHD);
// Do kernel invocations using the array here
// ...
// Copy the 2D array to the host output
CUDA_MEMCPY2D copyDH = new CUDA_MEMCPY2D();
copyDH.srcMemoryType = CUmemorytype.CU_MEMORYTYPE_ARRAY;
copyDH.srcArray = array;
copyDH.dstMemoryType = CUmemorytype.CU_MEMORYTYPE_HOST;
copyDH.dstHost = Pointer.to(output);
copyDH.dstPitch = width * Sizeof.FLOAT;
copyDH.WidthInBytes = width * Sizeof.FLOAT;
copyDH.Height = height;
JCudaDriver.cuMemcpy2D(copyDH);
boolean passed = true;
for (int i=0; i<width*height; i++)
{
System.out.println(output**+" ");
if (input** != output**)
{
passed = false;
break;
}
}
System.out.println("Test "+(passed?"PASSED":"FAILED"));
// Clean up.
JCudaDriver.cuArrayDestroy(array);
}
}
(Note that the “JCudaTextureSample” from the JCuda samples page also involves some 2D- and 3D memcopies)
BTW: When you intend to copy device memory into an array, you’ll have to look closely at the specification and usage examples of cudaMallocPitch() and cudaMalloc3D() to ensure that the alignment requirements for the memory are met.
bye
Marco