JCuda: How to support structs?


#1

Hello everybody,

This thread is intended for a discussion about how structs may be represented in JCuda.

I already have been working on this topic a while ago, and post a few first thoughts. There are several ways of how structs could be handled. At large, I think these approaches can roughly be separated into two categories, each with its advantages and disadvantages:

[B]1. Using the fields of a Java class as the fields of a native struct. [/B]

This approach was used in the experimental(!) struct support for JOCL. (It is, however, fairly independent of JOCL itself, and could probably be used for JCuda, with some minor adjustments). A much more sophisticated implementation of this approach is also used in the JNA Structures.

For this approach, a dedicated “Struct” class is created. The public fields of this class correspond to the elements of the native struct. These fields are read and written using reflection. There are still many design choices about how the fields may be accessed, and how these structs may be transferred between the host and the device. For the JOCL example, I tried to decouple the handling of structs from the rest of the API. So there is basically the abstract “Struct” class, and two Methods which allow reading/writing an array of “Struct” Java objects (with all the fields they contain) from/to a ByteBuffer. These ByteBuffers may be copied arbitrarily between the host and the device, and interpreted as an array of native structs inside the kernel.

Advantages:
[ul]
[li]The structs may be used in Java nearly in the same way as they may be used in CUDA kernels. A struct in C is basically equivalent to a Java Class without methods and with only public fields, so large parts of the code may be syntactically consistent in both languages.
[/li][li]This approach is probably the most efficient one for computations on Java side, since all the computations are done only on public fields, without method calls. Conerning the data transfer, it might, however, be slower than other approaches.
[/li][li]A clearly defined state is written to the device. This might sound trivial, but considering any approach which uses a tighter coupling between structs and the actual host memory block, one has to carefully think about how the structs may be maintained. For example, having the same struct inside of two arrays, or changing the order of structs inside an array could mess up a memory mapping.
[/li][/ul]

Disadvantages:
[ul]
[li]The Java struct Objects have to be explicitly copied to a ByteBuffer, before this ByteBuffer can be copied to the GPU. This might seem counterintuitive, and intoduces an additional copying operation.
[/li][/ul]

The second type of approach might be…

[B]2. Creating an interface that describes the struct[/B]

One could also consider a more Java-like approach for structs. This could mean that a struct is represented as a “Java Bean Interface”. For each element of the native struct, there exist set(…)/get()-Methods in the corresponding Java Interface. These methods could either be implemented manually, or maybe even automatically by creating a dynamic proxy class. As far as I know, a similar approach is used when code is generated using Gluegen, which creates an internal representation of structs and writes them out as Java classes which resemble the interfaces described above.

Possible advantages:
[ul]
[li] This approach might avoid some overhead and inconveniences compared to the first one, since it would not be necessary to copy the structs into host memory manually. The method calls can directly operate on the corresponding, pre-allocated memory.
[/li][li] A more object-oriented style could give greater flexibility concerning the many possible implementations of such a “struct interface”
[/li][/ul]

(Possible) disadvantages:
[ul]
[li]The syntax might be more inconvenient on Java side, since the whole data manipulation has to be done through methods. (Although a Java Programmer should be used to do that…)
[/li][li]Doing the whole data manipulation through interface methods may be very slow compared to the first approach.
[/li][li]A larger and more sophisticated infrastructure would be required to maintain these structs properly. For example, it might be necessary to create a dedicated class that takes the role of a “struct array”, in order to avoid inconsistencies when the order of struct instances inside an array is changed.
[/li][li]Defining a struct on Java side (which corresponds to a native struct) might require more efforts by the user of the library.
[/li][/ul]

Obviously, I have seen more advantages in the first approach, and that’s why I used it for the experimental JOCL struct support. But I’d be glad to hear about aspects or Pros/Cons that I did not yet consider in this summary, or even about approaches that are completely different to those that I’ve been thinking about so far.

bye
Marco


#2

I particularly to be with this type of problem. really liked this topic and possible discussions about it.
I found the first solution more elegant and interesting, that in the second structure to be created is aa very complex and may possibly damage the optimization and trading with the GPU’s.

cheers;


#3

Hello Marco. I was looking for ways to transfer to GPU using JCuda and I stumbled upon this thread.
Would you kindly please provide an example for the first method?
Thank you very much in advance.


#4

Hello

You might have noticed that this thread is already rather old. It was intended as a starting point for a discussion, in order to find out the most appropriate approach. However, there obviously was not much interest in this topic (until now ;)).

In general, one should keep in mind that for GPUs it is advantageous to store the data as a Structure Of Arrays (in contrast to an Array Of Structures), in order to exploit coalesced memory access. Admittdedly, I even think that I don’t have enough practical experience with “complex” GPU-based applications to judge whether or when the transfer of structs to CUDA really makes sense.

However, the first approach is already implemented (in a VERY prototypical way…) for JOCL, including an example, and is available at http://www.jocl.org/utilities/utilities.html . Do you have any more specific questions about that?

bye
Marco


#5

Hello Marco,
Thanks for your prompt answer :slight_smile:
I think for now, the “Structure of Arrays” will do just fine for my purpose. I thought there might be a better way, but for now I’ve got my answer.
Thanks for your help again.


#6

Depending on your application case, you might want to choose a “mixed view”. This is basically what is described on http://code.google.com/p/aparapi/wiki/AparapiPatterns under “How can I use Aparapi and still maintain an object-oriented view of my data?”.

VERY short and simplified: You may create a “structure of arrays”, and then provide an object oriented view on these flat data arrays. If you define your objects via interfaces, you can easily switch between the “usual” implementation and the “CUDA-view” implementation: The Java code does not even have to know that the actual data is stored in a large, flat structure of arrays.

Of course, there may be some caveats. And in general, this raises the question about what you are going to to with these objects on Java side. But if you want to perform several (non-data-parallel) tasks on Java side, this may be one way to hide the nitty-gritty CUDA-C-stuff that is required for the GPU computation from the rest of your program.