Recursive function with JCUDA

Hello, I’m new to CUDA and JCUDA and maybe my question could be out of place, I’m sorry if that is the case.

I’m trying to optimize a process using GPUs, the process is to calculate all possible combinations of players to form partitions. To understand my problem, I use the following example. For exampe we have a set of players N = {1,2,3,4,5}, this players are regrouped like this {1,2},{3,4},{5}. It means palyer 1 will play with player 2 as a single player, and so one. Each group of players has a set of strategies or choices. Each player chooses the group to which he wants to belong, for example:
the group {1,2} has these possibilites {{1,2};{1,2,3,4}}; i.e the players {1,2} either they choose to stay together or to join the group{3,4}.
the same explanation for the rest of players:{3,4}=>{{3,4};{3,4,5};{1,2,3,4}}
{5}=>{{5};{3,4,5}}.
Now, the group of players choosing the same strategy will be form a new group (coalition). For example, group{1,2} chose the strategy {1,2,3,4} ;ie. the players {1,2} want to form a new group with players{3,4}. Players {3,4} choose {3,4,5}, player{5} choose {3,4,5}. The players have chosen the same strategy will group together to form in the final a partition of plyaers like this: so the partition of players formed here is :{1,2},{3,4,5}. W e do the same with all possible combinations of players’ strategies to obtain all possible partitions of players.

Now my question is if it is possible to use parallilism with JCUDA to compute all possible partitions of players, especially when we have many players, for example |N|=1000 , then we have 1000^1000 possible partitions!!!
I have programed this process as a recursive function to get all possible partitions of players like this:


import java.util.ArrayList;
import java.util.Hashtable;

public class Partitions {
	public Hashtable<ArrayList<Integer>, ArrayList<ArrayList<Integer>>> HashTab = new Hashtable<ArrayList<Integer>,ArrayList<ArrayList<Integer>>>(); // The key is the chosen strategy(coalition) and the value is the group of players have chosen the same coalition(strategy).
	
	// create partitions combination
    public void CalculStrategiesCombination ( int index, ArrayList<ObjectsCoalitions> PlayerCoalitions, int K,int L) {
		index = index +1;
		   if(index < PlayerCoalitions.size()){ 
			   for(int j =0; j< PlayerCoalitions.get(index).Coaltions.size(); j++){
				   if(!this.HashTab.containsKey(PlayerCoalitions.get(index).Coaltions.get(j))){
					   ArrayList<ArrayList<Integer>> e = new  ArrayList<ArrayList<Integer>>();
					   e.add(PlayerCoalitions.get(index).Objects);
					   this.HashTab.put(PlayerCoalitions.get(index).Coaltions.get(j), e);
				   }
				   else{
					   this.HashTab.get(PlayerCoalitions.get(index).Coaltions.get(j)).add(PlayerCoalitions.get(index).Objects); 
				   }
				   if(this.HashTab.size()<K){
				      CalculStrategiesCombination (index, PlayerCoalitions,K,L);
				   }
				   if(this.HashTab.get(PlayerCoalitions.get(index).Coaltions.get(j)).size()==1){
				      this.HashTab.remove(PlayerCoalitions.get(index).Coaltions.get(j));
				   }
				   else{
					      this.HashTab.get(PlayerCoalitions.get(index).Coaltions.get(j)).remove(this.HashTab.get(PlayerCoalitions.get(index).Coaltions.get(j)).size()-1);
				   }
			  }
			}
			else {	  
				  // Treatement.......
				
			 }
	}

}

public class ObjectsCoalitions {

	public ArrayList<Integer>Objects = new ArrayList<Integer>(); // for example Objects = {1,2}
	public ArrayList<ArrayList<Integer>> Coaltions = new  ArrayList<ArrayList<Integer>> (); //  coalitions (strategis)={{1,2};{1,2,3,4}}
}

Please I want to know if it is possible to resolve this problem with jcuda, and what I must do and by what I must start, as i said I’m new with jcuda and cuda. Any suggestions please ?!
Thanks in advance. Best regards.

Hello,

The question itself is not out of place.

But I’m not sure whether CUDA may help you here. I wrote a few words about where it actually makes sense to use the GPU in this answer to the question about using Java with Nvidia GPUs. And I think this will already answer some of your questions in general.

The key point is that the GPU is not a magical device that simply runs 1000 threads.

(And even if it did: 1000^1000 is


1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

… scroll to the right for a while :wink: - this is completely out of scope of any computation)

And there are additional difficulties: Only the newest GPUs support recursion at all, but you have to know exactly what you are doing there in order to get it right. Convenient structures like an ArrayList, and even less a Hashtable simply do not exist on the GPU, and it’s hard (nearly impossible) to implement them for the GPU.

More technically: I can not see any data-parallel or compute-bound problem there until now. The reason why something like this is time-consuming is not that there is so much to compute, but because the asymptotic complexity is exponential - and there is nothing in the world to solve this.

Even if you can reduce the complexity to something „reasonable“, then I think that a plain Java solution would be more appropriate, but I probably have to re-read it in order to understand it properly.

Maybe you can describe your actual goal? So to say: If you had an infinite amount of memory, and an infinite amount of time, and had computed these 10^3000 combinations… what would you do with them?

bye
Marco

Thank you for your response. My goal is to determine which optimal partition is among all possibles partitions. I know that is impossible to generate 1000^1000 partitions, but if we limit the number of players to 200 players and the number of strategies for example to 10 per player, could jcuda help me to accelerate the generation of all possible partitions? I have searched in the net I didn’t find intersting solutions, so I want to know if I could resolve my problem using jcuda, or I must search another soltuion ? what do you mean by “then I think that a plain Java solution would be more appropriate”. I have found that java 8 support parallisim, do you think that it is efficient to resolve my problem ? Please, I need your response, because I 'm new with jcuda, and perhaps I waste time to learn jcuda and finaly it isn’t the solution, what do you propose to me ?
thank you in adavance.
Best regards

I’m not sure whether I can answer your question.

The problem seems to be rather complicated. I would have to invest a lot more time to really understand what you are about to do, with these groups, strategies, coalitions, etc. It might be possible to implement it efficiently. It might also be possible to implement it efficiently in CUDA. But I simply don’t know.

All I can say that unless you have a very clear idea of how to map your problem to the GPU, you will not have much fun with implementing it. There are resources about implementing permutations in CUDA, e.g. Permutations with CUDA and OpenCL - CodeProject , but I’m not sure whether this will help you much. In any case, it should show that something that can be implemented with a few lines of code and a recursive function in Java can become quite complicated on the GPU.

Again: CUDA will NOT autoagically make your code run faster. You may gain a certain speedup, for very specific kinds of problems.

From what I have seen until now, it would probably be far easier to write a clean and efficient implementation in plain Java, without CUDA.

But again, regardless of which approach you choose: If the problem has an exponential complexity, then no approach will help you. When you find the solution for 100 players and 10 strategies in 5 minutes, and then want to compute the result for 101 players, it might take 5 hours, and for 102 players it might take 5 years. And even IF you found an efficient implementation with CUDA, the latter might not take 5 years but “only” 5 months, unless you increase the number to 103 players.