Hello Piotr,
There are two different usage patterns where one has the choice between arrays or (direct) ByteBuffers:
- The first one is for the basic API usage, where the arrays only involve „few objects“. For example, when obtaining the available platforms with
clGetPlatformIDs(n, platformsArray, null);
- The second one is the transfer of the actual data that is processed by a kernel. For example, in
clEnqueueReadBuffer(..., pointerToArray...);
These uses are rather different.
Regarding 1.:
When I started JOCL, I already knew JOGL (I think the Jogamp-JOCL did not exist back then, but am not sure), and other JNI-based libraries. And I always found it a bit inconvenient having to create direct ByteBuffers. This can be particularly annoying for for „small“ arrays, considering the choice between
and
float data[] = new float[] { 1, 2, 3 };
ByteBuffer bb = ByteBuffer.allocateDirect(data.length * 4);
bb.order(ByteOrder.nativeOrder());
FloatBuffer fb = bb.asFloatBuffer();
someMethod(fb);
Of course, the latter is calling for some convenience/utility methods. But this may have the drawback that a potentially very large number of small, temporary direct ByteBuffers had to be created, causing the GarbageCollector to run mad. (You mentioned that Jogamp-JOCL does some caching/pooling, which is the obvious „solution“ of this problem, but I can’t tell from the tip of my head how exactly they solved this). In any case, I thought that plain Java arrays are a bit more convenient. (Still, they are a bit of a hassle compared to the simplicity of the original C API: The places where JOCL uses these arrays (like the platforms example above) usually correspond to pointers in C. So in C you’d not necessarily pass an array to such a function, but simply the address of a single variable).
Concerning the performance: I actually can’t remember having made detailed, dedicated performance tests for these „small-array-cases“. I’m rather sure that the performance difference will not be significant, but even if using arrays causes an overhead here, I think that this can justified, particularly for JOCL: The most time-consuming part of an OpenCL program will usually not consist of millions of calls to functions like clGetPlatformIDs
. Instead, most of the time will be used for copying memory and running kernels.
One could probably imagine usage patterns where any potential overhead that is imposed by any function call may become more important. Regardless of the fact that for these cases, the difference between using arrays and using ByteBuffers will probably still be negligible compared to the overhead that is imposed by the JNI call itself: If I become aware of such a pattern, and find out that offering one particular method (or several methods) in an overloaded form that alternatively accepts ByteBuffers, I’ll certainly consider adding these methods.
Until then, I’d rather create a dedicated test to find out how much difference there acutally is between small arrays and ByteBuffers (Something similar has been on my „to do“ list for JCuda for years now, but it did not really have high priority).
Regarding 2.:
For the actual data transfer, direct ByteBuffers can already be used. So you can already write
clEnqueueReadBuffer(..., pointerToByteBuffer...);
using either a direct ByteBuffer, or a Heap-Based ByteBuffer. So for this case of the actual „data blocks“, one has the option to use either arrays or direct ByteBuffers.
However, it is correct that using arrays can cause some headaches in combination with garbage collection, as you mentioned in
IIRC using plain Java arrays during JNI calls prevents garbage collection and replacing Java arrays with direct ByteBuffers solves that problem.
Direct ByteBuffers are allocated outside the heap, and not directly touched by the garbage collector, which makes it easier to use their actual adresses on the native side. In contrast to that, a Java array has to be „pinned“ on the native side, to prevent it from being garbage collected. Whether or not „pinning“ is supported depends on the JVM, and there is no way to find this out reliably. This also means that some function calls that involve Java arrays have to be blocking, because it is not really feasible to pin an array across multiple JNI calls.
But I think that particularly for these „data blocks“, there is one rather compelling reason why I tried hard to support plain Java arrays as well: Namely, the interoperability with existing programs. A usual Java Program that does some number crunching will never-ever use (direct) ByteBuffers/FloatBuffers. All the methods will return plain float[]
arrays, or accept them as their arguments. I think that it can be advantageous to have the possibility to pass these arrays directly to OpenCL, without first having to copy them into a direct ByteBuffer, and afterwards copying the results from a ByteBuffer back into an array. (Still, there are some unknowns, namely the actual handling of the arrays in JNI concerning pinning, but it’s at least a best-effort approach to avoid unnecessary copies of larger memory blocks).
Thanks for the link to the paper, that looks interesting, and I’ll definitely have a look at this!
bye
Marco