Greetings, this is my first post on this forum. After reading a few other threads, I’m sure that this is the best place to find help and maybe share some ideas.

I have written a simulation in Java, of an amorphous computer (for those who are unfamiliar, in this case it is extremely similar to a cellular automata/game of life simulation, but the environment is continuous rather than a grid).

The simulation, once run, executes one ‘time step’, during which a single ‘cell’ is selected at random and a whole bunch of calculations are made which decide how the state of that cell will change/not change. During this calculation, the cell’s neighbours’ states may also change. Since the environment is continuous, neighbourhoods are determined by distance calculations (i.e., any other cell within a set radius from the cell in question is said to be within the neighbourhood of said cell). A cell’s position is given simply by a set of coordinates (x, y). Before beginning the simulation, a number of time steps is selected (say, 10,000,000), and the execution of code for one time step is simply repeated that number of times. Nothing hugely complicated takes place.

In my current simulation, the entire sequence of one time step is executed in one thread, and there is no parallelism. However, I believe that the simulation would benefit greatly from the implementation of JCuda (although that is not why I want to implement JCuda - I want to do it just to see if I can, out of academic interest).

I have managed to get a few small programs running with JCuda, such as examples from Jcuda.org, and I have been reading Cuda by Example (but have only understood some of it - I have no experience with C), however I still require some help. I am pretty sure that as soon as I get one or two simple implementations of Jcuda running (which I have written and understood, rather than downloaded as an example), then I will become much more self-sufficient with it.

At first, I’m not looking to use JCuda en-masse throughout my simulation. What I’d like to do is this:

1: Identify a place within one ‘time step’ where the most benefit would come from a simple calculation being made in parallel (there are many places where JCuda could be implemented, but none seem to be a more obvious choice than any other - perhaps they are all good candidates, but for now I’m just looking for one simple implementation…). There are plenty of instances, within one time step, where many calculations are made in order for all cells where they could be executed in parallel with much greater efficiency. I’ll give a few ideas about this, and then I’m hoping people will feel free to give their own thoughts on it.

2: Once a suitable calculation has been selected to be implemented in parallel rather than in order, I’d very much appreciate it if someone could help me understand what I need to do to be able to get it to work. The Java code is written, and it works. Now I would just like to make another version, the same, but with a simple inclusion of JCuda. I’m not looking for someone to do the work for me - but I’m really stuck since I have no experience with C and I don’t really understand what actually needs to be done (in code) in order to call upon the GPU to perform a simple calculation (but I’m learning fast).

So, for a better description of exactly where I think JCuda could fit in…

- There are places in the simulation where the distance between a selected cell, and all other cells, is calculated in order. I believe that, instead, the distance between the selected cell and all other cells could be calculated in parallel (there are a changeable number of cells, but I typically use 10,000 - so it tends to be slow when executed in order).

The code for the calculation is simple (probably not optimal - but there are reasons for the way it is):

```
{
while(elementcount <= numberofelements) {
Double dist;
Element f = elements.get(index);
elements.remove(f);
ArrayList<Element> local = f.getLocal();
for(Element a : elements) {
double x1;
double y1;
double x2;
double y2;
double r;
r = f.getRange();
x1 = f.getXposition();
y1 = f.getYposition();
x2 = a.getXposition();
y2 = a.getYposition();
dist = Math.sqrt(Math.pow(x1 - x2, 2) + Math.pow(y1 - y2, 2));
if(dist <= r){
local.add(a);
}
}
elements.add(f);
elementcount++;
}
elementcount = 0;
}```
2. At each time step, the selected element (which is chosen at random) is moved a random short distance in a random direction. This involves repeatedly generating random numbers. Perhaps there could be some parallel implementation of generating random numbers (I haven't spent much time thinking about this option, but I'm sure it could be done). Particularly at the start of the simulation, when the cells are generated, each cell is given random location coordinates. Rather than doing this in order, one by one for each cell, this could perhaps be done once in parallel (just a thought...). I know I just said that the selected cell is the one that moves at each time step, but if there was a way to update the position of all cells, in parallel, for each time step, that would be awesome (although that sounds quite ambitious for a beginner implementation - that's something I'd definitely like to implement in the future once I've learned a little more...).
As you can see, in '1.', the code cycles through an arraylist of cells, calculating the distance for each as it does so, and then adds those in range of the cell in question to another arraylist (this arraylist is actually just a list of the cells that are in the neighbourhood of the cell in question). Rather than cycling through an arraylist of all cells, in order, I'd like to just make all of the distance calculations in parallel.
Thanks for taking the time to read my post. I would appreciate any feedback that you might have that would help me get started with JCuda.
:)
```