Few questions about preparing algorythm utilizing jocl

kacperpl1 · 3. März 2011 um 09:16

Thats how i tried to do this, but If i wan’t to write more than once I’m getting stuck with invalid context.
Originaly it was like this:

        {
            memObjects = new cl_mem[6];
        }
            memObjects[0] = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR, Sizeof.cl_float * 1, Arg1, null);
            memObjects[1] = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR, Sizeof.cl_float * 1, Arg2, null);
            memObjects[2] = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR, Sizeof.cl_float * 1, Arg3, null);
            memObjects[3] = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR, Sizeof.cl_float * n, Arg4, null);
            memObjects[4] = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR, Sizeof.cl_float * n, Arg5, null);
            memObjects[5] = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR, Sizeof.cl_float * n, Arg6, null);```
I was just recreating the buffers every single loop so I'm trying to make it like this:
```if(memObjects == null)
        {
            memObjects = new cl_mem[6];
        //}
            memObjects[0] = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR, Sizeof.cl_float * 1, Arg1, null);
            memObjects[1] = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR, Sizeof.cl_float * 1, Arg2, null);
            memObjects[2] = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR, Sizeof.cl_float * 1, Arg3, null);
            memObjects[3] = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR, Sizeof.cl_float * n, Arg4, null);
            memObjects[4] = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR, Sizeof.cl_float * n, Arg5, null);
            memObjects[5] = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR, Sizeof.cl_float * n, Arg6, null);
        }
        else
        {
            clEnqueueWriteBuffer( commandQueue, memObjects[0], false, 0, (Sizeof.cl_float * 1), Arg1, 0, null, null);
            clEnqueueWriteBuffer( commandQueue, memObjects[1], false, 0, (Sizeof.cl_float * 1), Arg2, 0, null, null);
            clEnqueueWriteBuffer( commandQueue, memObjects[2], false, 0, (Sizeof.cl_float * 1), Arg3, 0, null, null);
            clEnqueueWriteBuffer( commandQueue, memObjects[3], false, 0, (Sizeof.cl_float * n), Arg4, 0, null, null);
            clEnqueueWriteBuffer( commandQueue, memObjects[4], false, 0, (Sizeof.cl_float * n), Arg5, 0, null, null);
            clEnqueueWriteBuffer( commandQueue, memObjects[5], false, 0, (Sizeof.cl_float * n), Arg6, 0, null, null);
        }```
The problem is that I'm getting this:
[Javascript]Exception in thread "Thread-3" org.jocl.CLException: CL_INVALID_CONTEXT
        at org.jocl.CL.checkResult(CL.java:562)
        at org.jocl.CL.clEnqueueWriteBuffer(CL.java:11953)
        at zefir.ZefirMath.calcNr(ZefirMath.java:419)
        at zefir.ZefirView$thread.run(ZefirView.java:1036)
        at java.lang.Thread.run(Thread.java:662)
Exception in thread "AsyncOpThread-1" org.jocl.CLException: CL_INVALID_EVENT
        at org.jocl.CL.checkResult(CL.java:562)
        at org.jocl.CL.clWaitForEvents(CL.java:9918)
        at org.jocl.CL$3.run(CL.java:1688)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)[/Javascript]
Line 419 is the second clEnqueueWriteBuffer - line 15 here.

Additionally I've noticed something interesting about new nvidia drivers.
Originally to use openCL on nvidia cards we had(As i remember) install:
nvidia dev drivers
nvidia cuda toolkit with opencl support
ati stream stdk

and now I noticed that it works out of box with my stock(not dev) drivers, without ati stream sdk and cuda toolkit. On ubuntu I was unable to install stream sdk and it worked without it. On windows, As I installed ati stream sdk to check if support for CPU + GPU was possible for nvidia gpu's I noticed it doesn't change anything so I've removed it and because of that openCL stopped working so i had to install stream sdk again(or maybe I could reinstall nvidia driver).

To sum it up: new nvidia drivers support openCL out of box - don't bother installing stream sdk.

Marco13 · 3. März 2011 um 10:08

[QUOTE=kacperpl1]Thats how i tried to do this, but If i wan’t to write more than once I’m getting stuck with invalid context.
…
[/quote]

As far as I can see, there is nothing obviously wrong with that. Unfortunately, the CL error return codes tend to be quite arbitrary … I regularly receive an „invalid command queue“ when accidentally accessing arrays outside their bounds, but if the ‚n‘ which is used to allocate the buffer objects is really the maximum value that is used in any kernel invocation, there should be nothing wrong with that.

According to the stack trace, it might be some blocking/non-blocking issue there, so you might try doing blocking writes:
clEnqueueWriteBuffer( commandQueue, memObjects[0], true, 0, (Sizeof.cl_float * 1), Arg1, 0, null, null);
…
Non-blocking operations are difficult, but non-blocking writes should work, since they are much less critical than non-blocking reads. If not, … maybe you have found a bug I’ll try to test this again.

I’m not sure which of the „non-developer“-drivers support OpenCL. Usually, when you install OpenCL in any form, then it puts the „OpenCL.dll“ into a system directory, and this DLL will be used as the entry point to all installations. Maybe NVIDIA pulled the OpenCL support into their stock drivers - sooner or later it will become mainstream like OpenGL.

kacperpl1 · 3. März 2011 um 11:39

[QUOTE=Marco13]According to the stack trace, it might be some blocking/non-blocking issue there, so you might try doing blocking writes:
clEnqueueWriteBuffer( commandQueue, memObjects[0], true, 0, (Sizeof.cl_float * 1), Arg1, 0, null, null);[/QUOTE]
I tried doing that with true but once it given me some bad parameter error, now it crashes my java process in windows ;). I’ll try doing that on linux next time.

Anyway, don’t get me wrong at this; I’m not pursuing resolution for this bug to finish the app. Its just that jocl can possibly be good way of doing accelerated apps for casual programmers like me. I’ve tried doing something in cuda when i got my first DX10 card and it was great but when it comes to combining things that app should have there’s too much time taken to deploy cuda/opencl app for c/c++. If you want to know exactly what I’m talking about - try checking what would you need to do to have app with acceleration, gui and that will work on both linux, windows and x86/amd64 architectures.

Somehow I think we should try to get some support from khronos and/or nvidia/amd with those bugs.

Marco13 · 3. März 2011 um 12:55

Well, I don’t like bugs When I know that there is a bug, I try to fix it as soon as possible. For the case of the memory leak: This is already (and still) under investigation, since it is definitely nothing that I can influence on my own in any way. If there is another bug/leak, I’ll try to track it down (A „stress test“ for non-blocking write operations is already on my ‚todo‘ list. Of course I already did a basic test for that, but when it comes to multithreading and asynchronous operations, reproducing possible bugs may be difficult…).

And hoping that you dont get me wrong with this: In most cases where I received unexplicable „CL_INVALID_something“ errors or painful JVM crashes, there was something wrong with the kernel, especially write operations to invalid array indices - which does not mean that this is also the case here, but chances are high: As far as I remember, your kernels are quite complex. It’s probably not possible (or much effort) to create a self-contained example which crashes reproducably…?

One „brute force“ check which I occasionally used to make sure that a crash was not a bug in JOCL was to comment out all array accesses (especially, the write accesses) of a kernel. Of course this is not applicable in all cases, but might help to narrow down the possible locations of the error.

Unfortunately, the support for OpenCL debuggers is rather limited - and calling OpenCL from Java does not make this easier. Actually, I got some JOCL programs running in gDEBugger. This is a powerful tool and looks very promising, but the OpenCL debugging support seems still to be … under development. I’ve also been working on some helper classes for the inital setup and tests of kernels, but they are still far from being usable.

But indeed, some support by the implementors may be helpful. I’ll probably start another attempt to get an NVIDIA developer account - maybe this helps to spot possible difficulties in the upcoming versions of OpenCL earlier.

kacperpl1 · 4. März 2011 um 01:23

I’ve done this finally(made it to use same buffer all the time) but the leaks are still big. As predicted it was my mistake in the code that it was unable to run :P. At the moment I’m trying to find out which command is leaking the most for my code.

Marco13 · 14. März 2011 um 15:34

Just a link to this thread