Cuda Error Unknown

I am trying to get jcuda working, but I am having trouble getting the kernal samples to run. The jcublas, jcurand, ect. examples work fine, but whenever I try to run a jcuda example compiled using netbeans I get the error message:

Exception in thread “main” jcuda.CudaException: CUDA_ERROR_UNKNOWN
at jcuda.driver.JCudaDriver.checkResult(
at jcuda.driver.JCudaDriver.cuCtxSynchronize(
at jcuda.sample.JCudaVectorAdd.main(

The cubin files are compiled succesfully into ptx files, and it seems like everything else is working, but for the vector addition and reduction samples I get this message. I’m at a loss as to whats causing this problem so any help would be much apreciated.


Well, unfortunately a “CUDA_ERROR_UNKNOWN” does not tell you very much. Some helpless debugging attempts:

  • What happens if you comment out ONLY the kernel call? (To check whether it might be related to the PTX file and the kernel itself)
  • What happens if you do the kernel call, but leave the Kernel “empty”? (i.e. comment out everyting inside the “{…}” kernel function body)

Apart from that: Which operating system / CUDA Toolkit version / JCuda version are you using?


When I comment out either the kernel call or the kernel body the program completes without error. Additionally, the program runs if i dont try to alter any of the passed in arrays in the kernel, ie. I store a**+b** in a temp variable instead of sum**. Also I’m not sure if I mentioned this but the error always occurs when calling cuCtxSynchronize() after launching the kernel.

I have windows 7 64bit and CUDA toolkit 4.2, 32bit if thats important, and JCuda 4.2.

You are using the 32 bit toolkit on a 64 bit machine? I’m not sure, it’s just a guess, but this might cause the problem. If you look at the sample, you’ll see that the PTX file is compiled according to the architecture (32 or 64 bit).

You might want to try disabling the automatic compilation in the sample, and instead try to compile the CU file into a PTX file manually from the console with
nvcc -m32 -ptx -o JCudaVectorAddKernel.ptx
nvcc -m64 -ptx -o JCudaVectorAddKernel.ptx
respectively - admittedly, that’s the point: I’m not sure whether you need 32 or 64 bit here.

If this does not help, you might consider installing the 64bit Toolkit…

Yeah, I had the 64 bit toolkit installed, but it wasnt working. I read you need 64 bit visual studio to use it, which is not available for the free version I think.

I’ve tried compiling from the console and I get the same error message. The cubin files wont actually compile with the -m64 flag on my machine, so I actually had to change the sample slightly to use -m32 instead. It might have something to do with an incompatibility between the 32 bit toolkit and 64 bit machines, but I’m not sure. Thanks for the suggestions.

The (64bit) Toolkit and Visual studio are fairly unrelated. Even the SDK should work with a 32bit version of Visual Studio. (I’m not sure about the “64bit extensions” that once had to be installed separately for older VS versions - but the newer versions should already include these)

In which way was the 64bit version “not working” ?

I think I was getting an error compiling the .cu files, but now that I think about it that might have been due to something else. Honestly it was a few weeks ago when I switched to the 32 bit version so I don’t remember specifically the error I was getting. I think I’ll try the 64 bit version again and see if I can get it working.

I installed the 64 bit toolkit and sdk, but I’m still getting the same error. I wonder if there is a way to get more information about whats causing it?

Did you re-compile the PTX with the -m64 flag?

Otherwise, I’m not sure how to proceed with debugging. Of course, running a test of the same program in a pure native C/C++ version might help to find out whether it’s related to JCuda or to CUDA itself… Do you have the SDK installed? It contains a “vector addition example” which is similar to the JCuda sample.

Even if you don’t have the SDK installed: The program is also available at , you could try whether you can run it directly.

Without the -m32 flag I am unable to compile the ptx, I get the following error

Visual Studio configuration file ‘(null)’ could not be found for installation at ‘C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin/…/…’

I’ve encountered this before, and the only solution ive found thusfar is to use the -m32 flag…

In regards to the C/C++ versions, the samples run fine, though I havn’t attempted to compile them myself.

I really have no idea at this point what else I could do to try to fix this. I may just have to give up for the time being. Thanks for your help.

I made a websearch for the error message, and found - I’m not sure, but think (!) that I also had to do this on one PC:

To make the fix, copy the needed file “vcvars64.bat” and rename it to “vcvarsamd64.bat” as follows:

C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin\vcvars64.bat


C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin\amd64\vcvarsamd64.bat

Upon the change, the program compiled and ran successfully.

Maybe you can try this as well (or look at the other hints that are given in this post). I think it’s not yet time to give up :slight_smile:

You can get a the Visual Studio 64bit compiler for free as well by installing the Windows SDK:
This will allow you to compile with “-m64”.

Hello I’m having the same problems.
I have full 64 bit development environment and -m64 o -m32 doesn’t seem to affect the problem.
I can run one kernel without a problem, is after running a second kernel over the same data when I have the Error unknown.

Are you using surface writes in your kernel?

@Vicente: Some more detailed information might be helpful. As stated in the first response an ‘unknown error’ may have many reasons. If your kernel runs once and seems to work properly (and then causes an error when you attempt to run it again) please check very, very carefully that you are not accessing invalid memory - i.e. make sure that you are not writing outside of the bounds of an array.

No… I’m not using surfaces.
Is there any way to handle/manage/inquire about CUDA errors? such as a cudaGetlastError?

Is there any problem running more than one kernel (one after the other) or running a loop of kernels?
The very very simple vectoradd example gives that message when repeated several time:
for(int i =0; i<1000; i++){
System.out.println("Iteration "+ i);
gridSizeX, 1, 1,
blockSizeX, 1, 1,
0, null,
kernelParameters, null);
If I comment out the cuCtxSynchronize(), it runs for 45 times and then I get the Cuda Error Unknown, with the cuCtxSynchronize active it only runs twice. Which probably means it only really runs the first iteration and then it crashes anyway (it just keeps printing that line…)
maybe I’m doing something wrong bt I get this fragile behaviour in all my JCuda programs that have more than one kernel launch…

All CUDA/JCuda function return an ‘error code’. Either directly, as the return value, or in the ‘errcode_ret’ as the last function parameter. If you call
as some of the first lines in your main() method, then these error codes will be checked automatically, and in case of an error, the exception will be thrown. (You’re probably already doing this, otherwise your program would just fail silently).

The main difficulty here is the statement that is made in the documentation of basically every CUDA function:

Note that this function may also return error codes from previous, asynchronous launches.

Due to the asynchronous nature of CUDA, you might, for example, call 3 methods (or launch 3 kernels), and see an error after the launch of the third one. But the error might actually have been caused by the first or second one - you never know.

Debugging CUDA is difficult. And debugging JCuda may be even more difficult. For CUDA, there is the CUDA-GDB debugger ( ) but admittedly, I have not yet even tried to apply this to Java programs.

However, in nearly all cases that I have experienced so far, “unknown errors” have happened due to writes to invalid memory regions (that’s why I LOVE Java programs that fail with an ArrayIndexOutOfBoundsException - you always know perfectly what’s wrong :wink: ). If your kernel is not overly complex, maybe posting it here might help to find the error, but of course, this is not a promise…

Oh, I overlooked the second post about the VectorAdd example.

Did you run this test with the orignal “VectorAdd” example from the website? If yes, then this behavior is indeed strange and of course not desired. I’ll check this ASAP.

Thanks a lot Marco13 :slight_smile:
I have plenty of experience with CUDA (Cuda Fortran, Cuda C, PyCuda and Cuda from Matlab) and yes usually the problems is reading/writing outside of memory (99.9% of the time).
I really like the possibility of using it with Java. Finally I have a project where I can use it extensively (been working in Fortran for the last two projects…)

I’m using “basically” the vectoradd example (with few modifications of my own), Ill run some tests with the original files, in couple of machine sand I’ll let you know.

Thanks a lot for the quick reply :slight_smile: