Using cuComplex on kernel

Anderson · 18. Dezember 2012 um 10:27

Hello

I dont know how to send a cuComplex[] to an kernel.
I tried but could not do it.

Any help is welcome.

Marco13 · 18. Dezember 2012 um 14:33

Hello

I’m not entirely sure what you mean: A Java cuComplex[] array can not be sent to a kernel - because it is not in device memory. But I assume that you already allocated the device memory, and actually wanted to copy the contents of a cuComplex array to device memory. But this is also not directly possible: As for all Java Objects, there are no guarantees about the memory layout of the objects in such an array. So there is no contiguous memory block that can be copied.

Specifically: In C/C++, a cuComplex object only consists of 2 float values, and when many cuComplex objects are stored in an array, this is equivalent to storing pairs of float values, and this results in a single, contiguous block. In Java, each cuComplex object is a full-fledged Java object, and each one may reside anywhere in memory.

Depending on the exact use case, there may be different solutions for this. The most CUDA-friendly one is to not use a cuComplex[] array, but instead use a float[] array with a pair of float values for each complex number. This is also done, for example, in the JCufft methods that operate on complex arrays. The cuComplex class is mainly used to represent single complex values in method arguments, like certain factors in CUBLAS functions.

Manual conversions between an cuComplex[n] array and a float[n*2] array should be avoided, if possible, because they may be rather time consuming. Did you intend to perform computations on the complex numbers on host side?

bye
Marco

Anderson · 18. Dezember 2012 um 15:08

Yes, i have to perform computations on the complex numbers on a kernel.

Marco13 · 19. Dezember 2012 um 02:42

I meant whether you have to perform computations on the host side, i.e. in Java.

If you wanted to do something like

// Create array on Java side
cuComplex array[] = new cuComplex[3];
array[0] = cuCmplx(1,2);
array[1] = cuCmplx(3,4);
array[2] = cuCmplx(5,6);

// Do some computations on host side
array[0] = cuAdd(array[1], array[2]);
...

// Afterwards copy the array contents to CUDA and call the kernel
...

then I see why you wanted to use a cuComplex array - although even in this case, one should try to find a solution where costly conversions can be avoided.

Otherwise, if you only want to define the complex numbers on Java side and pass them to the kernel, you can use a float[] array

// Create array on Java side
float array[] = new float[3*2]
array[0] = 1; array[1] = 2; // These are complex numbers: real and imaginary
array[2] = 3; array[3] = 4;
array[4] = 5; array[5] = 6;

// Copy the array contents to CUDA and call the kernel
cuMemcpyHtoD(deviceArray, Pointer.to(array), 3*2 * Sizeof.FLOAT);
...
cuLaunchKernel(function...);

On CUDA side, you can declare the parameters of the kernel as ‘cuComplex*’ values. The kernel only receives a pointer, and for the kernel it does not matter whether this is a pointer to ‘n’ cuComplex values or a pointer to ‘n*2’ float values.

Anderson · 19. Dezember 2012 um 13:45

I dont need to perform computations on the host side.

So i can just send a array with float[n*2] and receive as cuComplex[n] on kernel right?

Marco13 · 19. Dezember 2012 um 13:50

Yes. The device memory is allocated as a Pointer (or CUdeviceptr) anyhow, so the interpretation of this (“void”) pointer is up to the kernel. In this case, the kernel argument will simply be declared as a cuComplex*.

Anderson · 19. Dezember 2012 um 14:16

Ok. Thanks, it helped me a lot.