Yes, such a “variable number of anything” usually is a bit tricky. In fact, the easiest case, if (!) the number does not vary too much (and is not too large, of course), is to allocate a fixed-size array, corresponding to the maximum number of elements.
In general, such a graph can describe a very irregular structure, and … GPUs prefer the “simple, regular” ones. This also refers to what you are going to do with these references/links/arcs: Traversing them might cause memory accesses that are scattered, which is bad for the caches. The best case is to read a large block of memory, from the beginning to the end (memory accesses should be coalesced).
The question about how these links are represented at all might also be relevant. Instead of using Pointers to Structs, one should consider to use plain index sets instead. For example, the nodes could be stored as
struct Node {
int attribute;
int numberOfArcs;
int classValue;
int arcIndices[MAX_NUM_ARCS];
}
so that, instead of following a pointer, you can access the arc of a node like in
arcs[node.arcIndices**].evalType = 42;
But another general recommendation when programming for the GPU is to use a Structure-Of-Arrays, instead of an Array-Of-Structures. So in fact, instead of having a
Node array[] = new Node[n];
the representation that could be best suited for the GPU could be something like
int n = 100;
int attributes[] = new int[n];
int classValues[] = new int[n];
int numbersOfArcs[] = new int[n];
int arcIndices[] = new int[n * MAX_NUM_ARCS];
(Yes, this is horribly inconvenient - but it allows coalesced memory accesses, and additionally avoids any hassle that may be implied by structure alignment issues…)
Disclaimer:
I’m not a CUDA expert. These hints are the result of an only very basic understanding of how GPUs work, together with a glimpse at things like the Best Practices Guide :: CUDA Toolkit Documentation , and information obtained from other resources. You’ll also have to do some own research here. For example, I think that the constraints for coalesced memory accesses on newer GPUs are not as strong as they have been when I started reading more about CUDA.
You mentioned
I don’t know if I will need to copy the structure as I have parallel process or if the access to the structure could be shared without blocking
Which structure and parallelism does this refer to? Multiple Java Threads? The structure itself has to be copied to the GPU memory anyhow, so the only concurrent accesses there would be between the GPU threads (which might also be an issue, of course)