Hello,
Well, it can make sense, of course, but only in a certain context. When you have a large program which calls Math.sin at several places, then you can not simply replace all occurances of Math.sin with another call like “Cuda.sin”, and expect your application to run faster.
The key idea of CUDA is that of data-parallel processing. That is, you apply the same computation several hundred, thousand or even million times - only operating on different parts of data. Each of these operations is then computed in its own thread.
As a trivial example where it is may be faster to use CUDA: If you have, say, one Million float values, and want to compute the sine of each value, then you can do something like this in Java
int n = 1000000;
float input[] = inputArrayOfFloats(n);
float output[] = new float[n];
for (int i=0; i<n; i++)
{
output** = (float)Math.sin(input**);
}
In CUDA (or JCuda) you would write a kernel, which executes this operation in parallel:
extern "C"
__global__ void computeSine(float *input, float *output)
{
int index = threadIdx.x + blockDim.x*blockIdx.x;
output[index] = sin(input[index]);
}
This kernel could then be compiled into a CUBIN (CUDA binary) file, and loaded and executed using the CUDA Driver API.
Note that in this case, the operation (only one ‘sin’ computation) is still very simple, so the speedup may not yet be worth the effort. But if you have to perform more complex computations, preferable lots of arithmetic or trigonometric operations on a relatively small chunk of data, then CUDA will more likely bring noticable performance benefits.
EDIT: BTW, thanks for pointing me at GPULib, I didn’t know this one before.
bye
Marco