Strengths and limitations of JOCL

system · 18. November 2015 um 11:37

Dear All,

I am trying to look for any documentation to find out about features that JOCL supports and what are the current limitations. I haven’t come across such a documentation, it will be quite handy if someone can point me to if any available over the internet or just give an idea about the strengths and limitations of JOCL.

Regards
Niaz

Marco13 · 18. November 2015 um 12:18

Hello

The question is very broad. The strengths and limitations compared to what?

Other GPU libraries for Java?
Other OpenCL bindings for Java?
Plain OpenCL in C?

Maybe this would help to give a more focussed answer…

bye
Marco

Marco13 · 24. November 2015 um 10:06

So as discussed via E-mail, the question aimed at a comparison of JOCL with Aparapi, Jogamp’s JOCL and JavaCL, and whether there are any limitations (e.g Aparapi does not support exceptions and 2D arrays). It is basically a follow-up of the discussion about Optimizing an SPMV kernel using Aparapi on the Aparapi mailing list.

It’s hard to compare certain libraries really objectively (especially when you created one of these libraries ;)), but I’ll try.

First, I’d like to refer to my stack overflow answer about „Using Java with Nvidia GPU’s (cuda)“, because it gives some general advice and hints, not restricted to CUDA, but about GPU computing in general, and particularly, it summarizes some of the approaches and libraries that exist for GPU computing with Java.

In fact, these libraries are listed there in some order. I’m not sure which order this is but it has to do with the level of abstraction.

Seriously: There is a really fundamental difference between the categories:

General comparison of the libraries from the question

The category „(Byte)code translation and OpenCL code generation“ mentions Aparapi.

Aparapi takes pure Java code, and translates it into OpenCL with some sophisticated internal trickery. This is probably by far the most convenient method to use the GPU for a Java programmer: Just write Java code, and let Aparapi do the rest. It can really by so simple, although, there are some constraints. (You asked about Exceptions or 2D arrays - more on that later).

The category „Java OpenCL/CUDA binding libraries“ mentions JOCL, Jogamp-JOCL and JavaCL.

In fact, under the hood, there is not such a great difference between most of these libraries (as well as the OpenCL bindings from lwjgl.org), regarding the functionality or the ease of use. I already wrote a few words about the Difference between the JOCLs, but this may not bring sooo many insights here. These libraries basically aim at offering the functionality of OpenCL, according to the OpenCL specification by Khronos, in Java.

Most of these libraries tried to add a certain level of object-oriented abstraction. The OpenCL API lends itself to „naturally“ be represented in an object-oriented way, but there are some caveats. For example, you may want to have a look at

These are somewhat similar. In contrast to that, JOCL from jocl.org offers NO abstraction at all. It is a plain 1:1 mapping of OpenCL. (E.g. the cl_command_queue class from JOCL does not have any real methods - it is just an opaque pointer, in the C-sense).

(A side note: This 1:1 mapping was chosen with the intention to use it as the basis for a high-level, object-oriented abstraction layer. But quickly, all the other OpenCL bindings popped up, so I did not pursue this idea further. As it stands now, JOCL offers the functionality of OpenCL as-it-is. This means that it may be hard to use for a Java programmer, but can „directly“ be used by someone who already used OpenCL in C. It is rather verbose and maybe „cumbersome“, but it aims at offering everything that OpenCL offers. With great power comes great responsibility)

(Another side note: I can not give any hints about the pros/cons of the remaining libraries. There certainly are differences, maybe regarding stability, maintenance, support and ease of use. But I have not yet extensively used these libraries, and thus, simply can not judge them).

General limitations of OpenCL

You mentioned that Aparapi has some limitations. But in most cases, these are not limitations of Aparapi, but of OpenCL in general!.

Exceptions:

Aparapi tries to automatically translate Java code to OpenCL, which frees the developer from the burden of learning OpenCL. This can have many advantages, but it can not overcome the limitations of OpenCL. So some Java language constructs simply cannot be translated to OpenCL. For example: When OpenCL accesses an array out of bounds, it simply cannot "throw an IndexOutOfBoundsException". Instead, it will (in the best case) produce an error code, or (in the worst case) crash the whole program painfully.

Note that Aparapi at least offers the option to switch to a „pure Java mode“ for debugging, and then (AFAIK), you will receive an IndexOutOfBoundsException, and can at least derive where the error comes from.

You will have this problem with any library that uses OpenCL - and even if you program OpenCL directly in C/C++. Debugging OpenCL is really painful, and you’ll quickly become very diligent about checking your code twice before running it. (In C/C++, there are some debugging tools, e.g. gDEBugger. But these do not nicely integrate with any Java binding).

2D arrays:

It may sound surprising, but OpenCL does not have a concept for „multidimensional arrays“. It only knows cl_mem objects, which are basically large, raw, untyped chunks of memory, and you can do with them whatever you want. In general, when you want to model something like a 2D array, you have to map the 2D coordinates to a 1D array index. In many cases, this is straightforward. Instead of using a 2D array like
float array[][] = new float[sizeX][sizeY];
you create an 1D array
float array[] = new float[sizeX*sizeY];.
When you pass the corresponding cl_mem object to a 2D kernel, you can access the individual „2D“ array elements by computing the 1D index:


__kernel void someKernel(
    __global float *data, 
    const int sizeX,
    const int sizeY)
{
    int x = get_global_id(0);
    int y = get_global_id(1);
    int index = x + y * sizeX;
    float element = data[index]; // Access element [x][y] ...
    ...
}

Sparse Matrix-Vector Operations

Implementing sparse matrix-vector operations on the GPU is rather tricky, and there is still quite some research going on, to squeeze out as much performance as possible. When you „naively“ implement this on your own, it is unlikely that you will achieve a good speedup compared to some well-optimized, pure Java library.

However, people have invested some work in that already. The clMathLibraries (originally initiated by AMD) contain clSPARSE, which offers some sparse matrix-vector functionality.

I’m currently in the process of preparing Java bindings for clBLAS, to be used together with JOCL. Of course, this would only be the first step, and bindings for the other libraries (namely, clSPARSE) would follow soon. But at the moment, I’m doing some internal restructuring (among dozens of other things on my TODO list), so I can not give a definite date of when these will be available.