I am running JCuda on a 64 bit machine using Nividia 4.1. I;,m
As some members have noticed that cuInit() on the JCudaDriver never works. Hence the JCudaDriver is broken. It looks like that cuInit expects that it is the first thing running. However in JCuda the Java JVM, virtual machine is the first thing running. Hence cuInit() breaks. Usually with Out of Memory or Invaild command.
The work around is to call cuInit() before calling and bringing up the JVM. The easisest way to do this:
Compile up a shared library with the Nividia cuInit() call only.
In general, my possibilities for testing on Linux and MacOS are rather limited. At the moment, I can mainly test on Windows (I’ll probably set up my next PC for dual boot, so that I can at least cover Linux…)
So I’m always interested in feedback about other OSes. But in this case, I’m not sure what you are referring to. All the examples from the website start the CUDA interation with calling ‘cuInit’, and I have not yet heard about general problems with this method - and not of the necessity to create an own JNI library. Can you further elaborate which errors you have encountered? Did they also occur when running the samples?
I think I have the same error as Sheldon Fu which I have attached below. This may not be a jCuda bug.
From the Nivida Forum web site I have attached the issue and the page. I think it probally is not a JCuda Bug, and looks like a Nivida cuInit() code bug.
What I posted was a temp fix which worked for me to get around the bug.
I am on Umbutu Linux x86_64, Nivida 4.1 software. Cuda Toolkit 4.1.21_linux_64_unbutu.04, JCuda_All_0.4.1-src.zip which I compiled up to run.
Upon further investigation I found out that cuInit() driver API will fail too under JNI, with error code 1 (invalid value). cuDriverGetVersion works and return 4010. cuInit failure apparently is the reason why all the CUDA runtime functions fail.
I had a look at what libcuda.so has dependency on and it seems that it uses quite a few pthread functions and probably tries to enumerate the current threads of the host process when cuInit is called. That would explain why there is a difference between when cuInit is called straight from a native application and when it is called from JNI – the JVM has already created 28 threads at the time cuInit is called from JNI. If cuInit tries to enumerate the current threads and do something with them, there could be a chance that the particular way RHEL6/JVM creates these threads will gets in the way of cuInit().
With that assumption in mind, I devised a test workaround by writing a native program which calls cuInit first then create the JVM through JNI invocation interface to load and run our Java app with CUDA-based native methods. Unsurprisingly that works.
Sorry, the example where you need cuInit() is where you want to load your own CUDA Kernel using jcuda.driver.JCudaDriver. The example on the JCuda Website is the VectorAdd example called JCudaDriverSample.java. In that code there is a cuInit(0) call in Java which calls the cuInit() in the C code.
OK… I find it surprising that this problem has not been reported until now. Although it seems to be specific for Linux. As I mentioned, I’m not so familiar with Linux and can not really test the libs there, but assume that the contributor who regularly provides the Linux binaries will run the samples as basic tests as well…
I did not completely understand what the problem is, I’ll have to read the thread that you linked more throughoutly. But … I assume that it can not be solved by the trick that is described on http://code.google.com/p/javacl/wiki/TroubleShootingJavaCLOnLinux , can it?