Cuda and Java Processing

Dippo · 17. Januar 2011 um 08:36

Hi,

I got JCuda working in Java Processing! So far i can see, CUDA is fast!
But i have a few questions because i got a few problems.

I found in the forums the Java CUDA devicetest source. While this source is not difficult to understand, i got a few problems to get it to work. It seems that my videocard is not recognised, therefore it doesn’t continue. I removed the control check and suddenly loads of information showup in my console.

Here is the code:

import jcuda.runtime.*;
import jcuda.runtime.cudaError.*;
import jcuda.runtime.cudaComputeMode.*;

void setup() {
  noLoop();
  JCuda.setExceptionsEnabled(true);
}

void draw() {
  int deviceCountArray[] = new int[1];
  //if (JCuda.cudaGetDeviceCount(deviceCountArray) == 0) {    
  //  println("cudaGetDeviceCount failed! CUDA Driver and Runtime version may be mismatched.
");
  //  println("
Test FAILED!
");
  //  exit();
  // }
  //int deviceCount = deviceCountArray[0]; // This function call returns 0 if there are no CUDA capable devices.
  //if (deviceCount == 0) println("There is no device supporting CUDA");  
  int deviceCount=1;
  int dev;
  int driverVersionArray[] = new int[1];
  int runtimeVersionArray[] = new int[1];
  for (dev = 0; dev < deviceCount; ++dev) {
    cudaDeviceProp deviceProp = new cudaDeviceProp();
    JCuda.cudaGetDeviceProperties(deviceProp, dev);
    if (dev == 0) {
      // This function call returns 9999 for both major & minor fields, if no CUDA capable devices are present
      if (deviceProp.major == 9999 && deviceProp.minor == 9999)
        println("There is no device supporting CUDA.
");
      else if (deviceCount == 1)
        println("There is 1 device supporting CUDA
");
      else
        println("There are "+deviceCount+" devices supporting CUDA
");
    }
    String name = new String(deviceProp.name);
    name = name.substring(0, name.indexOf(0));
    println("
Device "+dev+" "+ name);
    JCuda.cudaDriverGetVersion(driverVersionArray);
    int driverVersion = driverVersionArray[0];
    println("  CUDA Driver Version:                           "+driverVersion / 1000+" "+ driverVersion % 100);
    JCuda.cudaRuntimeGetVersion(runtimeVersionArray);
    int runtimeVersion = runtimeVersionArray[0];
    println("  CUDA Runtime Version:                          "+runtimeVersion / 1000+" "+ runtimeVersion % 100);
    println("  CUDA Capability Major revision number:         "+deviceProp.major);
    println("  CUDA Capability Minor revision number:         "+ deviceProp.minor);
    println("  Total amount of global memory:                 "+deviceProp.totalGlobalMem+" bytes");
    println("  Number of multiprocessors:                     "+deviceProp.multiProcessorCount);
    println("  Number of cores:                               "+deviceProp.multiProcessorCount*8);
    println("  Total amount of constant memory:               "+deviceProp.totalConstMem);
    println("  Total amount of shared memory per block:       "+deviceProp.sharedMemPerBlock);
    println("  Total number of registers available per block: "+deviceProp.regsPerBlock);
    println("  Warp size:                                     "+deviceProp.warpSize);
    println("  Maximum number of threads per block:           "+deviceProp.maxThreadsPerBlock);
    print("  Maximum sizes of each dimension of a block:    "); 
    print(deviceProp.maxThreadsDim[0]+" x "); 
    print(deviceProp.maxThreadsDim[1]+" x "); 
    println(deviceProp.maxThreadsDim[2]);
    print("  Maximum sizes of each dimension of a grid:     "); 
    print(deviceProp.maxGridSize[0]+" x "); 
    print(deviceProp.maxGridSize[1]+" x "); 
    println(deviceProp.maxGridSize[2]);
    println("  Maximum memory pitch:                          "+deviceProp.memPitch);
    println("  Texture alignment:                             "+deviceProp.textureAlignment+" bytes");
    println("  Clock rate:                                    "+deviceProp.clockRate * 1e-6f+ " GHZ");
    print("  Concurrent copy and execution:                 ");
    if (deviceProp.deviceOverlap==1) println("yes");
    else println("no");
    print("  Run time limit on kernels:                     ");
    if (deviceProp.kernelExecTimeoutEnabled==1) println("yes");
    else println("no");
    print("  Integrated:                                    ");     
    if (deviceProp.integrated==1) println("yes");
    else println("no");
    print("  Support host page-locked memory mapping:       ");
    if (deviceProp.canMapHostMemory==1) println("yes");
    else println("no");
    println("  Compute mode:                                  "+deviceProp.computeMode);     
    //  if (deviceProp.computeMode == JCuda.cudaComputeModeDefault) print("Default (multiple host threads can use this device simultaneously)");
    //  if (deviceProp.computeMode == JCuda.cudaComputeModeExclusive) print("Exclusive (only one host thread at a time can use this device)"); 
    //  if (deviceProp.computeMode == JCuda.cudaComputeModeProhibited) print("Prohibited (no host thread can use this device) : Unknown");
  }
}

The result i get, is this:

There is 1 device supporting CUDA

Device 0 GeForce GTX 460
CUDA Driver Version: 3 20
CUDA Runtime Version: 3 20
CUDA Capability Major revision number: 2
CUDA Capability Minor revision number: 1
Total amount of global memory: 774307840 bytes
Number of multiprocessors: 7
Number of cores: 56
Total amount of constant memory: 65536
Total amount of shared memory per block: 49152
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647
Texture alignment: 512 bytes
Clock rate: 1.4 GHZ
Concurrent copy and execution: yes
Run time limit on kernels: yes
Integrated: no
Support host page-locked memory mapping: yes
Compute mode: 0

Is it known that there is a bug in cudaGetDeviceCount?

Keep up the good work! Greetings, Dippo.

Marco13 · 17. Januar 2011 um 12:22

Hello,

In fact, the documentation may be misleading at the first glance:


int jcuda.runtime.JCuda.cudaGetDeviceCount(int[ ] count)

Returns the number of compute-capable devices. 
...

This looks like the return value should really be the number of devices. But this is not the case:


...
Returns [u]in *count[/u] the number of devices with compute capability greater or equal to 1.0 
...
**Returns:**
cudaSuccess

The value that is returned by the function itself is just the error code. The actual number of devices is written into the array.

So when you write


if (JCuda.cudaGetDeviceCount(deviceCountArray) == 0) 
{
   ...
}

this really means


if (JCuda.cudaGetDeviceCount(deviceCountArray) == cudaError.cudaSuccess) 
{
   // Everything went fine, the result is written into the array :-)
}

bye
Marco

P.S: The cudaDeviceProp also has a convenience method for printing its contents:
System.out.println(deviceProp.toFormattedString());
But maybe this is still not as readable as the manual output.

Dippo · 20. Januar 2011 um 04:07

I got it working. Except, i don’t understand when i import the package, the class is not visible. Anyhow, it works.
Thanks.

import jcuda.runtime.*;
import jcuda.runtime.cudaError.*;
import jcuda.runtime.cudaComputeMode.*;
import jcuda.driver.JCudaDriver;

void setup() {
  noLoop();
  JCuda.setExceptionsEnabled(true);
}

void draw() {
  int deviceCountArray[] = new int[1];
  if (JCuda.cudaGetDeviceCount(deviceCountArray) != jcuda.runtime.cudaError.cudaSuccess) {    
    println("cudaGetDeviceCount failed! CUDA Driver and Runtime version may be mismatched.
");
    println("
Test FAILED!
");
    exit();
  }
  int deviceCount = deviceCountArray[0]; // This function call returns 0 if there are no CUDA capable devices.
  if (deviceCount == 0) println("There is no device supporting CUDA");    
  int dev;
  int driverVersionArray[] = new int[1];
  int runtimeVersionArray[] = new int[1];
  for (dev = 0; dev < deviceCount; ++dev) {
    cudaDeviceProp deviceProp = new cudaDeviceProp();
    JCuda.cudaGetDeviceProperties(deviceProp, dev);
    if (dev == 0) {
      // This function call returns 9999 for both major & minor fields, if no CUDA capable devices are present
      if (deviceProp.major == 9999 && deviceProp.minor == 9999)
        println("There is no device supporting CUDA.
");
      else if (deviceCount == 1)
        println("There is 1 device supporting CUDA
");
      else
        println("There are "+deviceCount+" devices supporting CUDA
");
    } 
    String devicecapture=deviceProp.toFormattedString();
    String[] deviceoutput=trim(splitTokens(devicecapture, "
"));
    for (int i=1;i<deviceoutput.length-1;i++)
      println(deviceoutput**);
  }
}```