Example JCudaRuntimeDriverMixSample.java doesn't do the work

system · 14. April 2011 um 14:27

Hello everyone,

I am on Ubuntu, kernel 2.6.35-2 and 64bits.
I have installed the cuda sdk 3.2.16
I use JCuda 0.3.2

Good news : I can compile my cuda samples (.cu → .cubin) with nvcc. And always good I can compile and lauch any JCuda examples without errors im my Eclipse IDE.

So here’s the problem :

When I launch the JCudaRuntimeDriverMixSample, found here : jcuda.org - Samples, all pass good (compilation and execution) but I have the output :

Input vector [0.73096776, 0.831441, 0.24053639, 0.6063452, 0.6374174]
Norm 1.4343714
Inverted vector [0.73096776, 0.831441, 0.24053639, 0.6063452, 0.6374174]
Norm 1.4343714

So I repeat : The .cu is compiled in .cubin correctly, all the java program runs well, but the job isn’t done. I have the same vector in output than in input.

I think it could comes to the copy of the value fromDeviceToHost ? Or maybe the kernel function is never called ?
But why because I haven’t modify anything it this example.

Someone has an idea about the problem?

Thanks all.

Marco13 · 14. April 2011 um 15:39

Hello

Can you try adding
Arrays.fill(vector, 0.0f);
before the last call to cuMemcpyDtoH? This will at least show whether the kernel call or the memcopy fails…

BTW: The sample needs a small update: I don’t know how the reference to JCufft slipped in there, it’s not needed and should be JCublas instead… : And these factors “*2” are also not needed - huh, I must have been overworked when I wrote this…

In any case, I’ll try to test this again tomorrow, maybe I can find out whether there’s something wrong with the example.

bye

system · 15. April 2011 um 01:08

Thank you, I give you feedback in the afternoon.

system · 15. April 2011 um 06:12

With the Arrays.fill(vector, 0.0f); before the last call to cuMemcpyDtoH it gives the same result, so the cuMemcpyDtoH works !

But I tryed to modify the examples like this :

[ul]
[li]cudaMalloc of a CUdeviceptr for a float array BUT no cuMemcpyHtoD[/li][li]call the cuda native function, which fill the array with a static value (like 4.2f)[/li][li]finally, get the result from the GPU into a Java float array (cuMemcpyDtoH), in function of the first cudaMalloc()[/li][/ul]
The result is very strange when I print my array : It gives me this value :

vector [0.5, 0.33333334, 0.25, 0.2, 0.16666667]
And this is exactly the values of one of my last example, when I tried some stuffs. I just filled initaly my Java float array with values as 1/i, for i = 2 to n+2.

So my conclusion : When I do a cuMemcpyDtoH it returns me a values of a previous array reference ! But I don’t understand HOW because I have even done a reboot of my computer !

You can find my source code here :

[ul]
[li]Java main function : http://pastie.org/1797381[/li][li]Cuda kernel (.cu) : http://pastie.org/1797384[/li][/ul]
Thanks for your help.

Marco13 · 15. April 2011 um 09:39

Hello

I did a short test with the original version, and it seemed to work (no surprise, since I already tested it before I uploaded it). But there had been some possible bugs (which I mentioned above), which might cause errors in a different environment - I’m not sure if it could be related to what you decribed, but in any case: I uploaded an updated version of this sample, you might want to test it. If you still encounter problems (which is not unlikely, since it was only a minor change) I can have a look at the example that you posted, probably early next week, and try to reproduce the behavior that you just described.

bye
Marco

system · 11. Juni 2011 um 04:29

Hello Marco,

I’m back to try to run a JCuda example !

So : I download the JCudaDriverCubinSample.java sample, with the JCudaCubinSample_kernel.cu kernel and try to launch it without any modification.

The nvcc compilation work correctly and the cubin is created.

But, the output of the program is : Test FAILED

When you do, in the Java sample, the test „is the expected value equal to the hostOuptut value“ (after the kernel call, line 144), I print all the hostOuptut value like that :

System.out.println(hostOutput** + " =? " + expected);

and the result is always 0 for the hostOutput** values

I can not understand why, can you say me if this samble work correctly in your computer ?

Thanks for your help.

Marco13 · 11. Juni 2011 um 11:54

Hello

OK, that has been a while… Could you please add
JCudaDriver.setExceptionsEnabled(true);
as the first line of the ‘main’? - Maybe this already brings a hint what might be wrong there…

bye
Marco

system · 19. Juni 2011 um 13:32

I’ve got an exception :

Exception in thread "main" jcuda.CudaException: CUDA_ERROR_INVALID_SOURCE
    at jcuda.driver.JCudaDriver.checkResult(JCudaDriver.java:170)
    at jcuda.driver.JCudaDriver.cuModuleLoad(JCudaDriver.java:1400)
    at dl.JCudaDriverCubinSample.main(JCudaDriverCubinSample.java:53)

Do you think it is my configuration ?

Marco13 · 19. Juni 2011 um 15:32

No, there should be nothing wrong, except what is described here: http://forum.byte-welt.de/showthread.php?t=3494
So it means that you might add the “-arch sm_XX” parameter when compiling the CUBIN file, where “XX” stands for the Compute Capability of your card.
Alternatively, you could use an PTX file instead of a CUBIN file, which may be more flexible. (I’m currently updating the samples to prefer PTX files, and hopefully I can upload them this week, together with the new version of JCuda for CUDA 4.0 and a short “Getting started” tutorial, which also covers the CUBIN/PTX issue)

system · 21. Juni 2011 um 14:03

Thank you, it works

The solution was to compile with the appropriate Compute Capability (2.1 in my case !)

See you soon, thank you for all your job.

Marco13 · 22. Juni 2011 um 06:15

I wrote a little bit about Creating Kernels in the newly linked Tutorial. This also refers to CUBIN- and PTX files, and an updated “JCudaDriverSample” has been added to show how PTX files may be used instead of CUBIN files, maybe it’s worth a look for you.
EDIT: The Link to the VectorAdd CUDA file will be fixed soon