GPU printf with compute capility >=2.0

system · 3. Februar 2011 um 11:14

Does any one have any sample JCuda code to handle GPU device printf statements? I can see mention of the capibility in JCudaDriver.cuCTXSetLimit but if I add a ‘printf’ to the .cu file it results in a CUDA_ERROR_INVALID_IMAGE.

Simon

Marco13 · 3. Februar 2011 um 12:35

Hello

How do you compile the CU file?

(Unfortunately, I’m still sticking to my GeForce 8800 which does not have Compute Capability 2.0, so I can not yet test this or give any more specific hints)

bye
Marco

system · 4. Februar 2011 um 11:21

I use a slightly modified form of the prepareCubinFile, setting the --arch to sm_20. It compiles fine with or without the printf(“Hello”); statement but running the kernel with the printf causes the error.

Compile line (with optimisationLevel=0, useFastMath=false)


String command ="nvcc " + modelString + " -arch sm_20 -O"+optimisationLevel+(useFastMath?" --use_fast_math":"")+" -I /home/simon2/NVIDIA_GPU_Computing_SDK/C/common/inc -cubin "+
            cuFile.getPath()+" -o "+cubinFileName;

printf enable line:

if(CUDA.getComputeCapability(dev)>=2.0f)
 {
    System.out.println("CUDA: Enabling device printf buffer");
    JCudaDriver.cuCtxSetLimit(CUlimit.CU_LIMIT_PRINTF_FIFO_SIZE, 4096);
 }

Which results in (without the printf):

Executing
nvcc -m64 -arch sm_20 -O0 -I /home/simon2/NVIDIA_GPU_Computing_SDK/C/common/inc -cubin football.cu -o football.cubin
nvcc process exitValue 0
CUDA: Found 1 device.
        #0        336x        1350000 khz        432 ghz total
CUDA: Using best device 'GeForce GTX 460'
CUDA: Compute capability 2.1
CUDA: Enabling device printf buffer
Test PASSED

Which results in (with the printf):

Executing
nvcc -m64 -arch sm_20 -O0 -I /home/simon2/NVIDIA_GPU_Computing_SDK/C/common/inc -cubin football.cu -o football.cubin
nvcc process exitValue 0
CUDA: Found 1 device.
        #0        336x        1350000 khz        432 ghz total
CUDA: Using best device 'GeForce GTX 460'
CUDA: Compute capability 2.1

CUDA: Enabling device printf buffer
Exception in thread "main" jcuda.CudaException: CUDA_ERROR_INVALID_IMAGE
        at jcuda.driver.JCudaDriver.checkResult(JCudaDriver.java:164)
        at jcuda.driver.JCudaDriver.cuModuleLoad(JCudaDriver.java:1351)
        at cuda.CudaAnalysis.main(CudaAnalysis.java:102)
Java Result: 1

Marco13 · 5. Februar 2011 um 07:52

Hello

Just a small remark: The „prepareCubinFile“ was just intended as a simplification for the sample. Usually, for „larger“ applications, the CUBINs will most likely be compiled in a sparate, manual step. However, the functionality of this method is also part of the JCuda utilities, namely of the KernelLauncher class, which also allows to programmatically add command line parameters. (Maybe I can even extend it to automatically use the highest sm_XX architecture flag that is supported by the target system).

Concerning the error message:
**
CUDA: Enabling device printf buffer
Exception in thread „main“ jcuda.CudaException: CUDA_ERROR_INVALID_IMAGE
**

Admittedly, I’m not sure where this message comes from, nor how to avoid it. In the worst case, it might be possible that printf can not be used outside of a pure C application, but this is just a guess, since I don’t really know the mechanisms which are used to bring the output of printf onto the screen…

A websearch about „cuda printf CUDA_ERROR_INVALID_IMAGE“ does not bring many results (and, interestingly, this thread here is near the top…). The only hint that I found was this, from the CUDA 3.2 Tech Brief (PDF file):

Since the only in-kernel syscall supported prior to CUDA Toolkit 3.2 was printf() and since printf() is typically used only for debugging purposes rather than in production code, support for the old-style linking mechanism has been removed in CUDA Toolkit 3.2. This means that CUBINs that call printf() that were compiled with CUDA Toolkit 3.1 will fail to load with CUDA drivers of version 260.xx or higher, returning the driver error CUDA_ERROR_INVALID_IMAGE. Recompiling the CUBINs and the applications that load them with CUDA Toolkit 3.2 will automatically enable the new linking mechanism and allow the CUBINs to load successfully.

But this can hardly apply here: I assume that you are already using the latest version of CUDA, namely 3.2, right?

bye
Marco

system · 5. Februar 2011 um 08:48

My bad! I was using JCUDA 3.1, upgraded to 3.2RC and guess what, it works.

Many thanks for your help…

Simon

Marco13 · 5. Februar 2011 um 17:41

Great to hear that! I feared that these „syscall“ functions might not work in general (in JCuda), which would have been a severe drawback. I hope that I’ll have the chance to update to a newer (Fermi) GPU soon, so that I can also play a little with these new functions.