Automatic fallback to pre-2.0 OpenCL

dragandj · 10. Juni 2015 um 09:25

Hi Marco,

I haven’t been able to test this, as my drivers are OpenCL 2.0, but I need at least a hint to decide how to make some of my library API.
I want to use clCreateCommandQueueWithProperties as the default method for the command-queue function - it works well. Now, it the driver is, say, just OpenCL 1.2, would JOCL library internally fall back and call the old clCreateCommandQueue (even if my library calls JOCL’s java method **WithProperties), or I need to provide the users with “legacy” comman-queue-opencl1?
I know that the intended behavior is that JOCL falls back to whatever is available, but you mentioned that it failed when you tried is recently - so I need your opinion and clarification.

Marco13 · 10. Juni 2015 um 10:24

Hello

Yes, I mentioned this in this post, referring to a question that I posted a while ago in the AMD forums: Detecting which functions are available depending ... - AMD Community

As JOCL is a 1:1 mapping of OpenCL, there will be no automatic fallback. An OpenCL program that uses an OpenCL 1.2 platform and tries to call clCreateCommandQueueWithProperties will crash painfully, and a JOCL program that uses an OpenCL 1.2 platform and tries to call clCreateCommandQueueWithProperties will crash painfully.

The above mentioned post may already have made clear that I also think that this behavior is … -_- well, not satisfactory. On the one hand, the user may always cause nasty crashes (in doubt, by writing to invalid memory locations in kernels - something that can never be checked on Java side). On the other hand, simply calling a certain function should not cause such an unpredicable behavior. I mean, seriously, how should something like this be tested? You would ALWAYS have to run tests with ALL possible platform versions, which is impossible considering that you can’t install an OpenCL 1.0 and OpenCL 1.1 driver (from the same vendor) in parallel…

I can’t imagine a reasonable solution for this - neither for OpenCL in general, nor for JOCL. The problem could, to some extent, be alleviated, by introducing an abstraction layer or utility functions (like the Utilities that I intend to update and publish at GitHub ASAP). But regardless of whether such a fallback was supported by „pure“ JOCL, or by an abstraction layer: The real problem comes up when the functions become „incompatible“ in any way. What if someone wants to clCreateCommandQueueWithProperties with Properties that are not offered in the original clCreateCommandQueue call? And this is one of the easier examples: When „new“ functions really do something completely new (like the pipes and SVM in CL 2.0), then any „fallback“ would basically boil down to „trying to emulate OpenCL 2.0 by using OpenCL 1.0 calls“…

So quoting from the answer from the AMD forum:

just check versin

(sic!)

I think there are different possible levels of granularity for this, depending on the exact application pattern. Roughly speaking: One could either do version checks „inlined“, on the fly, for „minor differences“, at places that are not time critical:

void setUp()
{
    // Query platform, devices etc...

    // Create context...
    cl_context context = ...

    if (queryVersion(context) >= 2.0) 
    {
        clCreateCommandQueueWithProperties(...);
    } 
    else 
    {
        clCreateCommandQueue(...);
    }
}

But of course, such version queries are NOT suitable for the „inner core“ of time-critical functions:

void calledMillionsOfTimes(cl_mem mem)
{
    // NOT FEASIBLE when this is calledMillionsOfTimes
    cl_context context = queryContrext(mem);
    cl_device devices[] = queryDevices(context);
    cl_platform platform = queryPlatform(devices);
    if (queryVersion(platform) >= 2.0) 
    {
        doSomeStuffWithSVM(mem);
    }
    else
    {
        doSomeStuffWithoutSVM(mem);
    }
}

In these cases, one would probably „pull up“ the version check

void calledOnlyOnce(cl_mem mem)
{
    // OK when this is calledOnlyOnce
    cl_context context = queryContrext(mem);
    cl_device devices[] = queryDevices(context);
    cl_platform platform = queryPlatform(devices);
    if (queryVersion(platform) >= 2.0) 
    {
        for (int i=0; i<1000000; i++) doSomeStuffWithSVM(mem);
    }
    else
    {
        for (int i=0; i<1000000; i++) doSomeStuffWithoutSVM(mem);
    }
}

But I also think that the goal of having platform-agnostic programs becomes far harder to achieve when not only the platforms, but also their supported OpenCL versions have expected to be different…

If anybody has ideas of how this problem could be tackled (again: it does not refer to JOCL, but to OpenCL in general), I’d be happy to hear about it.

bye
Marco

Piotr · 15. Juni 2015 um 13:34

I think that providing an abstraction is the only sane way to achieve fallback support. To avoid the overhead of contant OpenCL version checking you would have to wrap all objects like cl_context, cl_command_queue, etc in wrappers that also contain OpenCL version.

Has anyone tried emulating OpenCL 2.0 using OpenCL 1.0? I think no one is crazy enough to do that. And even if someone was, the results are unlikely to be satisfactory, like small subset of functions covered that have fallback support or prohibitively slow performance when fallback happens.

Abstractions can define their own application contract so that no emulation is needed. Also abstractions can be developed fully in Java (as all low-level functions are exposed by JOCL) so any Java coder can do them as he will.

Just my 2 cents

Marco13 · 15. Juni 2015 um 15:51

Sure, such an abstraction would be possible. Some parts of this could be rather „straightforward“, maybe similar to the way how the different OpenGL versions are handled in JOGL.

I already mentioned this occasionally: JCuda and JOCL had originally been intended only as „backends“ for an abstraction layer. But I think that developing and maintaining a good abstraction layer for OpenCL may involve some effort. I made some (non-public) steps and tests, the Utilities being only the „first level“ of abstraction, but I also sketched some more Java-ish classes built on top of JOCL. Maybe, one day, I’ll ~~quit my job and cash in for unemployment benefit~~ … ~~win the lottery~~… continue with that

But seriously: On the one hand, that its not so unrealistic. By far the most frequent tasks are comparatively simple: „Copy memory from host to GPU, launch a kernel, copy the results back“ - including some bookkeeping about contexts and queues. A while ago, I noticed that even a large 3-letter company is doing this for CUDA (and I considered implementing this API on top of JCuda and/or on top of JOCL (!), just to show off and to say: IN YOUR FACE :D). But on the other hand, people will expect all this to be properly designed and maintained, and apart from the version issues that are the main topic of this thread, I think this might bear some challenges…