Launching kernels in jcuda


This seems like a pretty active forum therefore I decided to to get registered over here and post my beginner problems :smiley:

Okay, first of all, I am not new to CUDA. I have been using is effectively with Visual Studio 2008 therefore I know what are the basics of using GPU etc.

Currently, I am trying to learn how to use jCUDA with Eclipse. I have made sure that it runs and am not facing any problems regarding including it’s libraries etc.

What I really want to know is how to execute kernels in jCUDA? i cannot understand the .cubin or .cu procedure.

Can anyone please help by providing a simple step by step procedure or a beginner tutorial? It would also be helpful if someone could suggest me a book. for learning CUDA in C++ i had used CUDA BY EXAMPLE by the nVIDIA guys. I would like if such a book was also available for jCUDA.

hoping for Your kind responces.


New members are always welcome, especially if there is the chance that they may contribute to more CUDA-specific questions :slight_smile:

The question about the general workflow in JCuda was recently asked in some threads, and I responded similarly here, here and a little more here. Some basic questions may be answered in these posts, but still the essence of these posts still has to be collected: The task of creating a „Getting started“ step-by-step tutorial is already mentioned there, and it’s of course already on my todo-list. The problem is that there are many, many tasks, far to many to work them all off in my spare time…


Thanks for the quick reply.

from what I understand by your posts which You had mentioned, It seems jCUDA is not that simple. for example, in C the workflow is following:-

  1. Get some data on host (CPU).
  2. Allocate memory on the device (GPU).
  3. Copy data on the device.
  4. Launch kernels on device by using <<numBlocks,numthreads>>.
  5. copy results to host.

Now I can do steps 1,2 & 3 in JAVA uptil now. You had mentioned something about .cu files. Does that mean that I can use the kernels i had written for C based CUDA and utilize it with JAVA?

I am including a little C CUDA program of generating prime numbers. Can You please tell me how to run this kernel in JAVA via .cu procedure? or any other?

Personally, i think CUDA in C was much straight forward. It went on like a simple C program. JCuda is a bit of a tangle, But using it has it’s own advantages.

#include <stdlib.h>

using namespace std;

int numBlocks = 16;
int numThreads = 128;
int p = numBlocks * numThreads;
//const int p = 100;
const int n = 10000000;

__device__ static int block_low(int id, int p, int n)
{return (id*n)/p;}

__device__ static int block_high(int id, int p, int n)
{return (block_low(id+1,p,n)-1);}

__device__ static int block_size(int id, int p, int n)
{return (block_low(id+1,p,n) - block_low(id,p,n));}

__global__ static void Sieve(int* sieve,int sieve_size,int p)
	int tid = threadIdx.x + blockIdx.x * blockDim.x;
	int prime;
	int low_value = block_low(tid,p,n-1);	
	int high_value = block_high(tid,p,n-1);
	int size = block_size(tid,p,n-1);	

	int index;
	int first;

	if (tid == 0) index = 0;
	prime = 2;
		if (sieve[prime] == 0)
		if (prime*prime > low_value)
			first = prime*prime - low_value;

			if (low_value%prime == 0) first = low_value;
			else first = low_value + (prime - (low_value % prime));
			}//End Else
		for (int i=first;i<=high_value;i+=prime) sieve** = 1;
				prime = prime + 1;				
	}while (prime*prime <= n); //End Do While

} //End Function
int main()
	int *host_sieve;
	int *device_sieve;
	int bl0_size = (n-1)/p;

	if (2+bl0_size < (int) sqrt((double) n))

	 Too Many Blocks"; 
	return 0;}//End If
	host_sieve = new int[n];	
	for (int i = 0; i<n; i++) host_sieve** = 0;

	cudaMalloc((void**) &device_sieve, sizeof(int) * n);  
	cudaMemcpy(host_sieve, device_sieve, sizeof(int) * n, cudaMemcpyDeviceToHost);  
	for (int i = 2; i<n; i++)
		if (host_sieve** == 0)
		{cout<<i<<"	";}

		if (i%20 == 0) getchar();


	delete host_sieve;
	return 0;
}//End Main```

Hey Again,

I think We can scratch that last post. I have understood the workflow in JAVA as compared to C for CUDA. Got help from a post about vector addition (I can’t seem to find the link to it right now :frowning: )

After setting up everything as mentioned in that post, Im getting the following error.

Exception in thread „main“ jcuda.CudaException: Could not prepare CUBIN for source file ‚‘
at jcuda.utils.KernelLauncher.create(
at VectorAddition.main(

Specifications are:-
Name: GeForce 8600M GS
Version: 1.1
OS: Windows 7: 32 bit

This is the command im using to launch the kernel
KernelLauncher kernelLauncher = KernelLauncher.create("", "VecAdd", false,"-arch sm_11");


OK, the error message does not help so much (it seems that the KernelLauncher swallows a more detailed exception there, I will check this).

What happens if you try to compile the CU file manually, with
nvcc -cubin -arch sm_11 -o VecAdd.cubin


well, the trouble is when I use the above mentioned command in command prompt, it says
"nvcc" cannot be recognized as an external or internal command or some crap like that.

I should probably mention here that I was using MVS 2008 Express edition with CUDA toolkit 3.2. I could effectively run my CUDA programs using c++ method. But VS 2008 had expired a few days back. it doesn’t open up now. Does that have something to do with the above mentioned problem?


OK, since the KernelLauncher internally does nothing else than calling the NVCC to compile the source code, it’s at least clear why it does not work.

I didn’t know that VS can „expire“ :confused:

The first thing you should check is whether the „\bin“ subdirectory of the CUDA Toolkit contains the ‚nvcc.exe‘ (and you may try to call it directly from this directory at the command line). If it’s there, you might want to check the PATH environment variable, to see whether it contains the path to the Toolkit\bin directory.
But in any case, NVCC itself does mainly delegate the work to „any“ C compiler (namely, cl.exe from VS), so a working C compiler will be required (not a working IDE, but only the compiler - so it might work even when the IDE has „expired“, whatever that means).


Hi again…

It seems i have solved the problem. I can now convert from .cu to .cubin. Got help from different threads such as nvidia forums and this forum as well. I can run the vector addition example. I will now move on to my own problems which i want to implement in CUDA. Will keep You guys updated about my progress with the jCUDA quest.

I probably should have looked into more detail before posting here :o)

Thanks a lot though.


To be more specific about how i solved the issue, it probably might help someone else:-

  1. Ditched the laptop and reinstalled MVS 2008 and CUDA toolkit 3.2 on my desktop.
  2. Copied .cu file at C:\Program Files\Microsoft Visual Studio 9.0\VC\bin whre cl.exe was located.
  3. Manually ran the nvcc command nvcc -cubin -arch sm_11 -o VecAdd.cubin to check whether it works or not.
  4. converted .cu into .cubin effectively but couldn’t find it later on :smiley:
  5. So i searched for VecAdd.cubin and found it in C:\Users
    abile\AppData\Local\VirtualStore\Program Files\Microsoft Visual Studio 9.0\VC\bin
  6. Copied and pasted the .cubin file in my project folder. (i dont know whether it was necessary or not)
  7. ran the jCUDA based JAVA Eclipse program.
  8. It executed and printed the results without any troubles.

I can now run the kernelLauncher thingi without hiccups.

I hope this helps other beginners too.

You can’t live in Fear (watched Ghost Rider last night :smiley: )