Program hangs on clBuildProgram(program, 0, null, null, null, n) with an empty kernel

Unbelievable!
My program hangs on a clBuildProgram call, with no specific parameters, and with an empty kernel. My program is successfully created but not built!
It happens on a Tesla S1070…

Sometimes, it takes more than 1hour to succeed in the build…
NVIDIA driver version : 304.54

Any Idea?

Sorry for the delayed response, I overlooked this thread somehow :o

But unfortunately, I have no idea what might be wrong there. You mentioned that you are using an empty kernel. What exactly does that mean? Are you using something like the “vector addition example”, and just removed all the contents from the kernel? Or does it happen in a different, more complex application? And … IF it manages to build the program within 1 hour, does this also refer to the empty kernel? (I occasionally noticed that really complex kernels can take some time for the compilation, but this should not be an issue in normal applications, and of course, an empty kernel should be compiled in nearly no time…)

Thank you for your answer…
This problem happens with my original kernel (complex) and tests with an “empty kernel” ( __kernel myKernel(){} ) were launched to highlight there is no relationship between the kernel complexity and the excessive build duration…

Well - I’m not sure how to tackle this problem.

First of all: Do you think that it is related to JOCL? One test could be to use one of the NVIDIA OpenCL examples (like the most basic vector addition example at https://developer.nvidia.com/opencl#oclVectorAdd ), and just replace the vector addition kernel with your kernel (and of course, not trying to execute it - only let it run up to ‘clBuildProgram’). If there is the same problem, then it might be more appropriate to start a thread about this at one of the NVIDIA forums. JOCL is simply passing the calls from Java to OpenCL, and if it hangs in one of these calls, there’s not much I can do.

However, I’m also curious what might be the reason for that.

Are you sure that the cl_program object is valid, i.e. that there was not error when creating it? (I recommend CL.setExceptionsEnabled(true) while running the first tests). Are you using any particularly complex features in your kernels? (Maybe some deeply nested loops or many local variables or samplers…?) Is it possible to post the kernel here, so that one may try to build it, maybe on other OpenCL-implementations? (A “dummy program” that just performs the boilerplate setup, up to ‘clBuildProgram’, where the problem can be reproduced, might be helpful here).

Sorry that I can not give more specific hints ATM…

Hi Marco,

Here is a my code…As I told you, “program build!!!” does not appear…

Thank you

final int platformIndex = 1;
final long deviceType = CL_DEVICE_TYPE_ALL;
final int deviceIndex = 0;

// Enable exceptions and subsequently omit error checks in this sample
CL.setExceptionsEnabled(true);

// Obtain the number of platforms
int numPlatformsArray[] = new int[1];
clGetPlatformIDs(0, null, numPlatformsArray);
int numPlatforms = numPlatformsArray[0];

// Obtain a platform ID
cl_platform_id platforms[] = new cl_platform_id[numPlatforms];
clGetPlatformIDs(platforms.length, platforms, null);
cl_platform_id platform = platforms[platformIndex];

// Initialize the context properties
cl_context_properties contextProperties = new cl_context_properties();
contextProperties.addProperty(CL_CONTEXT_PLATFORM, platform);

// Obtain the number of devices for the platform
int numDevicesArray[] = new int[1];
clGetDeviceIDs(platform, deviceType, 0, null, numDevicesArray);
int numDevices = numDevicesArray[0];

// Obtain a device ID
cl_device_id devices[] = new cl_device_id[numDevices];
clGetDeviceIDs(platform, deviceType, numDevices, devices, null);
cl_device_id device = devices[deviceIndex];
// Create a context for the selected device
cl_context context = clCreateContext(contextProperties, 1, new cl_device_id[] { device }, null, null, null);

// Create the program from the source code
cl_program program = clCreateProgramWithSource(context, 1, new String[] { programSource }, null, null);
System.out.println("program created");

// Build the program
clBuildProgram(program, 0, null, null, null, null);
System.out.println("program build!!!");

Hm well… that’s the usual (incomplete) boilerplate code. The remark in my previous post referred to the code for setting up your kernel - that is, to see wheher YOUR ‚programSource‘ can be compiled… :wink:

Aie! Sorry…

String programSource = "__kernel void " + “sampleKernel(){}”;

Sorry, I’m not sure what this is all about. Does the following program work for you

package org.jocl.samples;

import static org.jocl.CL.*;

import org.jocl.*;

public class JOCLEmptyKernelTest
{
    private static String programSource =
        "__kernel void sampleKernel(){}";
    
    public static void main(String args[])
    {
        // The platform, device type and device number
        // that will be used
        final int platformIndex = 1;
        final long deviceType = CL_DEVICE_TYPE_ALL;
        final int deviceIndex = 0;

        // Enable exceptions and subsequently omit error checks in this sample
        CL.setExceptionsEnabled(true);

        // Obtain the number of platforms
        int numPlatformsArray[] = new int[1];
        clGetPlatformIDs(0, null, numPlatformsArray);
        int numPlatforms = numPlatformsArray[0];

        // Obtain a platform ID
        cl_platform_id platforms[] = new cl_platform_id[numPlatforms];
        clGetPlatformIDs(platforms.length, platforms, null);
        cl_platform_id platform = platforms[platformIndex];

        // Initialize the context properties
        cl_context_properties contextProperties = new cl_context_properties();
        contextProperties.addProperty(CL_CONTEXT_PLATFORM, platform);
        
        // Obtain the number of devices for the platform
        int numDevicesArray[] = new int[1];
        clGetDeviceIDs(platform, deviceType, 0, null, numDevicesArray);
        int numDevices = numDevicesArray[0];
        
        // Obtain a device ID 
        cl_device_id devices[] = new cl_device_id[numDevices];
        clGetDeviceIDs(platform, deviceType, numDevices, devices, null);
        cl_device_id device = devices[deviceIndex];

        // Create a context for the selected device
        cl_context context = clCreateContext(
            contextProperties, 1, new cl_device_id[]{device}, 
            null, null, null);
        
        // Create the program from the source code
        cl_program program = clCreateProgramWithSource(context,
            1, new String[]{ programSource }, null, null);
        
        // Build the program
        System.out.println("Before");
        clBuildProgram(program, 0, null, null, null, null);
        System.out.println("After");
        
    }
}

?

If it works, what is the difference to your program?

If it does NOT work… have you tried running ANY OpenCL program, like one from https://developer.nvidia.com/opencl ?

Exactly the same… But I copied/pasted your code… Porgram still hangs on “Before…”
I gonna try with OpenCL but as you said, why it could work? JOCL = OpenCL + JNI…

Michael

Well, in some sense, that’s currently the most interesting point for me: If it works with “pure” OpenCL, I have a problem - namely, that it does not work due to some bug in JOCL. But if it does not work in “pure” OpenCL as well, then … NVIDIA has the problem.

I cannot imagine what should be wrong there, because I never experienced this behavior, and the call is just passed to OpenCL via JNI - just like all the other calls that are done before clBuildProgram…

A websearch brought http://www.khronos.org/message_boards/showthread.php/7672-clBuildProgram-Problem , but it’s several years old, and the poster did not say which JOCL he used (the only answer was from Michael Bien, the maintainter of the “other” JOCL, from jogamp.org). NVIDIA recently changed their whole compilation toolchain to LLVM, so I also can not imagine that this should be related to the current issue.

Hi Marco, certainly last reply…
I performed the same test with C++ & OpenCL… Hangs on Before…

Gonna expose this problem to khronos team…
Thanks again,

Michael

OK, on the one hand I’m glad to hear that it’s not related to JOCL itself, but of course it’s very unfortunate for you… -_- Maybe some newer (or older?) drivers can resolve this issue.

BTW: I think that Khronos is not appropriate here: It’s “only” the consortium for the standardization etc.
You mentioned that you used
final int platformIndex = 1;.
Which platform is this? You must have TWO OpenCL implementations installed. One is the implementation from NVIDIA. Is the other one from Intel or AMD? And which one is platform 1? You should mention this problem in the forum of the corresponding vendor, who provides the OpenCL implementation, since their compiler seems to have a problem here.

Whenever you have an issue with JOCL, or any other question regarding JOCL/OpenCL, you can post here, and I’ll try to help you, but in this case, I obviously could not do very much…

I tested with other drivers…
Don’t worry, the paltform index 1 pointed out the NVIDIA driver installation of the S1070…

And Many Thanks for your help,
Michael