I got JCuda working in Java Processing! So far i can see, CUDA is fast!
But i have a few questions because i got a few problems.
I found in the forums the Java CUDA devicetest source. While this source is not difficult to understand, i got a few problems to get it to work. It seems that my videocard is not recognised, therefore it doesn’t continue. I removed the control check and suddenly loads of information showup in my console.
Here is the code:
import jcuda.runtime.*;
import jcuda.runtime.cudaError.*;
import jcuda.runtime.cudaComputeMode.*;
void setup() {
void draw() {
int deviceCountArray[] = new int[1];
//if (JCuda.cudaGetDeviceCount(deviceCountArray) == 0) {
// println("cudaGetDeviceCount failed! CUDA Driver and Runtime version may be mismatched.
// println("
// exit();
// }
//int deviceCount = deviceCountArray[0]; // This function call returns 0 if there are no CUDA capable devices.
//if (deviceCount == 0) println("There is no device supporting CUDA");
int deviceCount=1;
int dev;
int driverVersionArray[] = new int[1];
int runtimeVersionArray[] = new int[1];
for (dev = 0; dev < deviceCount; ++dev) {
cudaDeviceProp deviceProp = new cudaDeviceProp();
JCuda.cudaGetDeviceProperties(deviceProp, dev);
if (dev == 0) {
// This function call returns 9999 for both major & minor fields, if no CUDA capable devices are present
if (deviceProp.major == 9999 && deviceProp.minor == 9999)
println("There is no device supporting CUDA.
else if (deviceCount == 1)
println("There is 1 device supporting CUDA
println("There are "+deviceCount+" devices supporting CUDA
String name = new String(deviceProp.name);
name = name.substring(0, name.indexOf(0));
Device "+dev+" "+ name);
int driverVersion = driverVersionArray[0];
println(" CUDA Driver Version: "+driverVersion / 1000+" "+ driverVersion % 100);
int runtimeVersion = runtimeVersionArray[0];
println(" CUDA Runtime Version: "+runtimeVersion / 1000+" "+ runtimeVersion % 100);
println(" CUDA Capability Major revision number: "+deviceProp.major);
println(" CUDA Capability Minor revision number: "+ deviceProp.minor);
println(" Total amount of global memory: "+deviceProp.totalGlobalMem+" bytes");
println(" Number of multiprocessors: "+deviceProp.multiProcessorCount);
println(" Number of cores: "+deviceProp.multiProcessorCount*8);
println(" Total amount of constant memory: "+deviceProp.totalConstMem);
println(" Total amount of shared memory per block: "+deviceProp.sharedMemPerBlock);
println(" Total number of registers available per block: "+deviceProp.regsPerBlock);
println(" Warp size: "+deviceProp.warpSize);
println(" Maximum number of threads per block: "+deviceProp.maxThreadsPerBlock);
print(" Maximum sizes of each dimension of a block: ");
print(deviceProp.maxThreadsDim[0]+" x ");
print(deviceProp.maxThreadsDim[1]+" x ");
print(" Maximum sizes of each dimension of a grid: ");
print(deviceProp.maxGridSize[0]+" x ");
print(deviceProp.maxGridSize[1]+" x ");
println(" Maximum memory pitch: "+deviceProp.memPitch);
println(" Texture alignment: "+deviceProp.textureAlignment+" bytes");
println(" Clock rate: "+deviceProp.clockRate * 1e-6f+ " GHZ");
print(" Concurrent copy and execution: ");
if (deviceProp.deviceOverlap==1) println("yes");
else println("no");
print(" Run time limit on kernels: ");
if (deviceProp.kernelExecTimeoutEnabled==1) println("yes");
else println("no");
print(" Integrated: ");
if (deviceProp.integrated==1) println("yes");
else println("no");
print(" Support host page-locked memory mapping: ");
if (deviceProp.canMapHostMemory==1) println("yes");
else println("no");
println(" Compute mode: "+deviceProp.computeMode);
// if (deviceProp.computeMode == JCuda.cudaComputeModeDefault) print("Default (multiple host threads can use this device simultaneously)");
// if (deviceProp.computeMode == JCuda.cudaComputeModeExclusive) print("Exclusive (only one host thread at a time can use this device)");
// if (deviceProp.computeMode == JCuda.cudaComputeModeProhibited) print("Prohibited (no host thread can use this device) : Unknown");
The result i get, is this:
There is 1 device supporting CUDA
Device 0 GeForce GTX 460
CUDA Driver Version: 3 20
CUDA Runtime Version: 3 20
CUDA Capability Major revision number: 2
CUDA Capability Minor revision number: 1
Total amount of global memory: 774307840 bytes
Number of multiprocessors: 7
Number of cores: 56
Total amount of constant memory: 65536
Total amount of shared memory per block: 49152
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647
Texture alignment: 512 bytes
Clock rate: 1.4 GHZ
Concurrent copy and execution: yes
Run time limit on kernels: yes
Integrated: no
Support host page-locked memory mapping: yes
Compute mode: 0
Is it known that there is a bug in cudaGetDeviceCount?
Keep up the good work! Greetings, Dippo.