Quite a few new questions


I started doing research for some new side project and problems keep piling up so I have some new questions, that, I hope You can answer.

First of all - is there any possible method to call some wait inside kernel for external event? Lets say I want a semaphore for host - gpu synchronisation but not for the queue but current running kernels, something like this

        sampleKernel(__global const float *a,
                     __global const float *b,
                     __global float *c,
                     __global int *d)
           int gid = get_global_id(0);
           while(d[0]<1){}//wait for semaphor to get incremented from host code
           c[gid] = a[gid] * b[gid];
So the first thing is that i enqueue 10 of those kernels in queue0 on context0 and just after that i enqueue write buffer with changed value for semaphore in queue1 on the same context0 - I get my gpu driver crashed and queue1 gets invalid context error there which means it never enqueued this writing.

Do I have to finish queue0 to run queue1 commands? Why I'can't do this simultaneously?

Is there any nice method of creating looped kernels that wait for some outside event?

Btw whats up with missing clCreateSubDevices?



As far as I know, there is no way of synchronizing between kernels inside kernels. All synchronization has to take place on command queue level, for example, using events. So the synchronization here could possibly done with events - not exactly like you described, but ROUGHLY like that:

  • Create 10 user events
  • Enqueue the kernel0 on queue0
    – Once with an “eventWaitList” that contains event0
    – Once with an “eventWaitList” that contains event1
  • Enqueue commands on queue1 that will set the status of the user events to ‘COMPLETED’ one after another

The “JOCLSample_1_1.java” from http://jocl.org/samples/samples.html shows some examples of user event handling, maybe you want to have a look at that.

However, the synchronization can be tricky if there is global memory involved. I’ll have to look up the spec to see under which conditions (and how) this is possible, but it’s not entirely clear whether you intended to use the global memory only to emulate a semaphore (which could be done using events), or whether you really wanted to use it for “communication” (in terms of data transfer)…?

The clCreateSubDevices method is part of OpenCL 1.2. Currently JOCL supports only OpenCL 1.1. I’m already working on (and basically finished) support for OpenCL 1.2, but since there are no official implementations for OpenCL 1.2, I have not yet updated it. The AMD drivers contain an OpenCL 1.2 preview, but only for AMD GPUs. At the moment, I’m using NVIDIA. However, once there is an OpenCL 1.2 implementation available, I’ll finish and test the update of JOCL and upload the new version.