Java program with JCuda (Beginner)

I am a college student interested in JCuda. I’m working on a project focused in execute a java program (which I’ve not developed) with CUDA to accelerate it.
I have been searching in Mr. Google how to compile with nvcc a java program and finally I get here, to this forum.
I have followed the steps posted here and tested the and it seems that all is well.
I’m just a beginner and I couldn’t understand the procedure to how I can make my java program work with JCuda. If I’m not wrong, I could compile with nvcc the java program using JCuda, but I’ve not found the steps to do it.
If somebody could help me, I would be very grateful.
Thanks in advance,
PD: Sorry for my bad English


In fact, a small “Getting started” tutorial would certainly be helpful. (Once I started to write one, but did not work much on that yet… I could say that I’ll try to continue with that in my next vacation, but this is already stuffed with other tasks… )

In any case: To give more focussed advice, it might be helpful to know more precisely what you intend to do.

But roughly speaking:

  • If you want to use one of the JCuda Runtime libraries (JCublas, JCufft etc.) you can use them with plain Java
  • If you want to write own CUDA kernels, you can have a look at the, which shows the basic workflow for that. A simpler solution might be to use the “KernelLaucher” class from the Utilities package. In the best case, you can simply write your CUDA kernel into a .CU file, and pass it to the kernel launcher (the site also contains a small example). But note that it might be necesary to pass additional arguments to the KernelLauncher, depending on your target platform.

BTW: I’m currently updating to CUDA 4.0, and this is a “game changer” - the invocation mechanism for own kernels has changed significantly. Fortunately, I think that it should be possible to hide this API change behind the KernelLauncher, so if you’re using this, it might even not require too many modifications on client side.

EDIT: Specifically to your question:

The Java part itself is still compiled with the normal Java compiler. The NVCC is used to compile the Kernel, that means the single .CU file that contains the CUDA Code. This will create a .CUBIN file, which can then be loaded and executed as shown in the (The KernelLauncher tries to do all this transparently at runtime)


Hello Marco,
first of all thanks for the reply.

What I’m trying to do, is to execute the msms program ( with the GPU.
This program has a too complicated code for my java knowledge, but I thought something like you have mentioned to implement my idea.
If I can compile the java program to something that after it could be sent to the GPU using JCuda, would be great! (not modifying the original code, or changing the minimum)

I had a look to the, and it’s a bit complicated for me, but I’ll keep working on it. This example is the way I should follow for my project? or for the msms program I have to focus in another way?

Thanks again!!!


At the moment I don’t have enough time to have a closer look at this. (BTW: The word „Simlation“ in the title should probably be „Simulation“ :wink: ). In general, you might want to have a look at the most time-consuming part of the computation, and see whether it can be expressed in a really data-parallel way. Maybe I can find the time to read though the documentation or have a glimpse at the source code later.


Hi again Marco,
Thanks for your reply.

The msms is not a plain Java program, so the option of using the JCuda runtime libraries is discarded.

Writing my own CUDA kernels could be an option, but I don’t have sufficient knowledge about it, and the complexity of msms would not help me.

I am a little confused about the kernelLauncher. I am not sure if I have understood it correctly, but what I should do is to modify the most time-consuming msms functions and in each one, use the kernelLauncher. I would have to create a string sourceCode for each function, call the kernelLauncher.compile (sourceCode,…) and allocate the memory of my cuda device.

I am right or I have not understood anything? :stuck_out_tongue:


Well, it’s hard to give specific hints as long as I’m not really familiar with the program (and I don’t know what the program is doing, actually).

But I’ll try a very general description: If you have some time-consuming computation in the “core” of your program, like

private void computeSolution(float input[], float output[])
    for (int i=0; i<input.length; i++)
        output** = (float)Math.sin(Math.cos(Math.tan(input**))); // Or whatever ;-)

then you can replace this Method by a CUDA kernel. In the end you might have a Kernel like

__global__ void computeSolution(float* input, float *output, int n)
    int i = threadIdx.x;
    output** = sin(cos(tan(input**)));

This kernel could then be stored either as a Java-String inside the Program, or in a single, additional file, like “”. This kernel could then be executed using the KernelLauncher. For a very simple application example, you might want to have a look at the “” from the Utilities package. In the end, your original “computeSolution” method could be replaced by something like

private void computeSolutionCUDA(float input[], float output[])
    copyMemoryToDevice(); // See the KernelLauncherSample, deviceOutput, n);

You also have the possibility to launch this kernel manually, as shown in the “” from the website, but the KernelLauncher makes things a little bit simpler…