Problem in running example program "matrix inverse"

qgpx2006 · 11. März 2011 um 07:47

Hi all,

I am a new user to CUDA and jcuda and I really need your help.

I have encounter some problem in running the sample program of “Matrix Inversion” in http://www.jcuda.de/samples/samples.html .

I have tried to use 32bit cubin and 64bit cubin, but I stilll get the same exception code: CUDA_ERROR_INVALID_SOURCE

Exception in thread "main" jcuda.CudaException: CUDA_ERROR_INVALID_SOURCE
        at jcuda.driver.JCudaDriver.checkResult(JCudaDriver.java:170)
        at jcuda.driver.JCudaDriver.cuModuleLoadDataEx(JCudaDriver.java:1613)
        at jcuda.utils.KernelLauncher.initModule(KernelLauncher.java:688)
        at jcuda.utils.KernelLauncher.create(KernelLauncher.java:395)
        at jcuda.utils.KernelLauncher.create(KernelLauncher.java:321)
        at MatrixInvert.init(MatrixInvert.java:58)
        at MatrixInvert.invert(MatrixInvert.java:108)
        at MatrixInvertSample.main(MatrixInvertSample.java:29)```

I have put the code in eclipse and Netbean for testing, but I still get the same result and there is my machine info:
OS: Windows server 2008 R2 64 bit
GPU:GT430 1G
CUDA SDK 3.2
driver: 266.58
RAM:2G
CPU:2.4GHz


I have not changed any code in the sample program. is there any configuration problem I have missed?

Marco13 · 11. März 2011 um 09:20

Hello

The CUBIN files are not only specific for the architecture (32 and 64 bit), but also concerning the “Compute Capability” of the GPU. Admittedly, I was not aware of this when I created them. I think they have been created for devices with Compute Capability 1.3, and I thought that devices would at least be downward compatible, but it seems not to be so: It seems that devices with Compute Capability >1.3 can not load and execute these files.

So you will probably have to compile the CUBINs on your own, for Compute Capability 2.1. The command line should roughly be
nvcc -m64 -arch sm_21 -cubin input.cu -o output.cubin

This seems to cause some problems recently (obviously, due to the increasing number of different Compute Capabilities…) - I’ll probably try to create samples based on PTX and the JIT compiler, which may be more versatile.

bye

qgpx2006 · 12. März 2011 um 00:17

Hi Macro13,

really thanks for the complier command suggestion. I finally get it complied. But I got errors.

GPUeliminateBlock_kernel.cu
C:/GPUeliminateBlock_kernel.cu(34): error: identifier “BLOCKSIZE” is undefined

C:/GPUeliminateBlock_kernel.cu(34): error: identifier “AVOIDBANKCONFLICTS” is undefined

C:/GPUeliminateBlock_kernel.cu(44): error: identifier “BLOCKSIZEMINUS1” is undefined

C:/GPUeliminateBlock_kernel.cu(74): error: identifier “BLOCKSIZE” is undefined

C:/GPUeliminateBlock_kernel.cu(74): error: identifier “AVOIDBANKCONFLICTS” is undefined

C:/GPUeliminateBlock_kernel.cu(83): error: identifier “BLOCKSIZEMINUS1” is undefined

6 errors detected in the compilation of “C:/Temp/1/tmpxft_00000dfc_00000000-6_GPUeliminateBlock_kernel.cpp4.ii”.

It seems that I am missing the .h file for variable in the cu file. But I have browser the source code in java. The variables like BLOCKSIZE are set in the java source. And in the original source: http://forums.nvidia.com/index.php?showtopic=80108 , the variable are missing too.

May I ask again how to complie the files? Many thanks

Marco13 · 12. März 2011 um 07:58

Oh, right, sorry, in this case the .cu files use some #define’s which also have to be passed in when executing the NVCC.

There are two options to solve this:

If you want to compile the .CUBIN files manually:
Then the command line should be


nvcc -m64 -arch sm_21 **-D BLOCKSIZE=16 -D BLOCKSIZEMINUS1=15 -D AVOIDBANKCONFLICTS=0** -cubin input.cu -o output.cubin

It’s best to create a .BAT file containing these command lines, where the ‘input’ and ‘output’ names are copied&pasted for the required files.
(EDIT: You may also want to have a look at the NVCC documentation in the CUDA toolkit /doc/ directory)

It should be possible to compile the CUBIN files automatically. The parameter specifying the Compute Capability can be passed directly to the KernelLauncher, which will then assemble the command line appropriately. The MatrixInvert class specifies the arguments for the NVCC as


String args = 
    "-D BLOCKSIZE=" + BLOCKSIZE + " " + 
    "-D BLOCKSIZEMINUS1=" + (BLOCKSIZE - 1) + " " + 
    "-D AVOIDBANKCONFLICTS=" + 0 + " ";

For compiling it with a Compute Capability 2.1 card, it should be extended by


String args = 
    **"-arch sm_21 " + **
    "-D BLOCKSIZE=" + BLOCKSIZE + " " + 
    "-D BLOCKSIZEMINUS1=" + (BLOCKSIZE - 1) + " " + 
    "-D AVOIDBANKCONFLICTS=" + 0 + " ";

I’ll try to offer simpler solutions for this, maybe I can do this early next week:

The KernelLauncher could query the Compute Capability of the target system, and insert this argument automatically. I’ll try to extend the KernelLauncher class so that it will not be necessary to add this argument manually in future versions.
As a fallback solution, I’ll also create the appropriate .BAT file for the manual compilation, include it in the download package.

And as soon as I find the time, I’ll try to create an example (and possibly an extension of the KernelLauncher) which demonstrates and simplifies the handling of PTX files - these are basically “machine independent assembler files”, which could be converted to appropriate CUBIN files at runtime with the JIT compiler. This should allow a more flexible distribution of kernels for multiple target architectures.

bye

qgpx2006 · 13. März 2011 um 06:02

I am graceful for your help again. The compile command and the example work fine fors me now. Thanks.