JOCL performance

on my notebook, i have a Geforce GT 720m card and the CPU is intel i7-4600U. I have run the JOCLSample.java with the array size increased to 99999999, and found the performance using GPU is beaten by using CPU, because the verification code running on CPU takes less time. Is there anything I need to pay attention to?

  1. You’re comparing low-end mobile GPU to high-end mobile CPU.
  2. Test seems bandwidth limited. Not only the GPU you have has less VRAM bandwidth than your CPU has to system RAM, but you’re also limited by PCI-Express and overhead of transferring data back and forth.

GPGPUs show their strength when there are lots of non-divergent computations and when they aren’t global VRAM bandwidth limited. I think one example of such task could be brute-forcing hashed password, it would show the computation power advantage very clearly.

Hello

There is a rather elaborate answer on stackoverflow.com covering many of the things you have to pay attention to. The question referred to CUDA, but the answer is mostly generic and applies to GPU computing in general.

The most important point is what Piotr already mentioned: The most expensive part of this sample is to copy the memory to the device, and then copying the result back. It’s only a very simple example for illustration purposes. But if you want to, you can change the line
c[gid] = a[gid] * b[gid];
to something like
c[gid] = sin(cos(sin(cos(sin(cos(a[gid])))))) * sin(cos(sin(cos(sin(cos(b[gid]))))));
just to artificially generate some (useless) computation work. Then, the GPU should be faster sooner or later.

bye
Marco

thanks a lot. That really clears it out for me. I tried the nested sin and cos thing, the GPU dominates with 200 times faster than CPU.