Strange invalid_value on clBuildProgram

Hi

I’m having a weird error when trying to build a program.

When i run the following code it crashes with a CL_INVALID_VALUE on clBuildProgram…
However none of the error descriptions in the documentation have anything to do with what’s exactly happening in this code.

			String kernelString = Activator.getInstance().loadKernel(kKERNEL_SOURCE_FILE);
			fProgram = CL.clCreateProgramWithSource(fEnvironment.getContext(), 1, new String[] { kernelString }, null, null);
			CL.clBuildProgram(fProgram, 0, null, null, null, null);
			fKernel = CL.clCreateKernel(fProgram, kKERNEL_METHOD, null);```

This is the kernel source file that is passed as a string to clCreateProgramWithSource:
```const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP | CLK_FILTER_LINEAR;

__kernel void transform(
	__read_only image2d_t target,
	__read_only image2d_t floating,
	__write_only image2d_t output,
	const float m00,
	const float m10,
	const float m01,
	const float m11,
	const float m02,
	const float m12
)
{
	int gx = get_global_id(0);
	int gy = get_global_id(1);
	
	int2 coord = {gx,gy};
	
	
	
	float xTrans = (m00 * gx) + (m01 * gy) + (m02 * 1);
	float yTrans = (m10 * gx) + (m11 * gy) + (m12 * 1);
	
	int xTransi = convert_int(xTrans);
	int yTransi = convert_int(yTrans);
	
	int2 coordTrans = {xTransi,yTransi};

	int width = get_image_width(floating);
	int height = get_image_height(floating);

	uint4 pixel;
	if(xTransi < 0 || yTransi < 0 || xTransi >= width || yTransi >= height){
		pixel.x = 255;
		pixel.y = 255;
		pixel.z = 255;
	}
	else{
		pixel = read_imageui(floating, sampler, coordTrans);
	}
	write_imageui(output, coord, pixel);
	
	
}```

I've been experimenting a little with the code in the kernel and i came across something strange..
When I comment the pieces of code that contain get_image_width, get_image_height and write_imageui the program does not crash when calling clBuildProgram. It's really strange because I have a whole bunch of kernels that I use before and after this one and they do build succesfully and even run succesfully.

So this is what i did to make this program build.

```int width = 0; //get_image_width(floating);
	int height = 0; //get_image_height(floating);

	uint4 pixel;
	if(xTransi < 0 || yTransi < 0 || xTransi >= width || yTransi >= height){
		pixel.x = 255;
		pixel.y = 255;
		pixel.z = 255;
	}
	else{
		pixel = read_imageui(floating, sampler, coordTrans);
	}
	//write_imageui(output, coord, pixel);```
Ofcourse this will not make the kernel work as it should but it does compile nevertheless..
I've ran the same kernels on a laptop and there it works fine!

It sounds a lot like a driver bug but then the other kernels that use the same functions calls shouldn't be able to succesfully build either.

Anyway, i'm doing this on a GTX 660Ti with driver version 306.02, i also tried 305.68.

thanks in advance

OK… I have not used the image functions tooo extensively yet, and the bug description sounds stange, indeed.

I first thought it might be related to the fact that some devices do (or did) not support Images at all, but this can’t be the case here. There are some “caveats” concerning images. For example, some adressing modes are not supported for some sampler types or so. It might be interesting whether you’re using the same sampler type in other kernels. In general, some more information about your setup might be interesting.

One test could be to DISable exceptions, and print the string that is returned by this function after the clBuildProgram call:

    /**
     * Obtain a single String containing the build logs of the given program for
     * all devices that are associated with the given program object.
     *
     * @param program The program object
     * @return The build logs, as a single string.
     */
    private static String obtainBuildLogs(cl_program program)
    {
        int numDevices[] = new int[1];
        CL.clGetProgramInfo(program, CL.CL_PROGRAM_NUM_DEVICES, Sizeof.cl_uint, Pointer.to(numDevices), null);
        cl_device_id devices[] = new cl_device_id[numDevices[0]];
        CL.clGetProgramInfo(program, CL.CL_PROGRAM_DEVICES, numDevices[0] * Sizeof.cl_device_id, Pointer.to(devices), null);

        StringBuffer sb = new StringBuffer();
        for (int i=0; i<devices.length; i++)
        {
            sb.append("Build log for device "+i+":
");
            long logSize[] = new long[1];
            CL.clGetProgramBuildInfo(program, devices**, CL.CL_PROGRAM_BUILD_LOG, 0, null, logSize);
            byte logData[] = new byte[(int)logSize[0]];
            CL.clGetProgramBuildInfo(program, devices**, CL.CL_PROGRAM_BUILD_LOG, logSize[0], Pointer.to(logData), null);
            sb.append(new String(logData, 0, logData.length-1));
            sb.append("
");
        }
        return sb.toString();
    }

Maybe it contains some more detailed information…

But in any case, it should not bail out with a “CL_INVALID_VALUE” due to a build error…

BTW: I just tested it here (Win32/NVIDIA 4.2) and it compiled without errors or warnings…

Thanks for the swift reply Marco!

Here is the follow up:

I DISabled the exceptions and right after the clBuildProgram call i tried obtaining the build logs using your function. For all of my kernels it said something like:

Build log for device 0:
Build started
Kernel <kernel name here> was successfully vectorized
Done.

But for the kernel that crashes all the time it was just this:

Build log for device 0:

No further information whatsoever…

The next thing i tried was disabling the exceptions right before building the program and enabling them again after it passed the call. From the moment the program tried to execute the kernel that was (i thought) just created it throws a big fat CL_INVALID_KERNEL. Which will probably mean that the program didn’t build after all…

Just like on your machine everything builds and runs smoothly on my laptop, Quadro
I have some other cards at my disposal to plug in into this test machine and they do work fine with it. At least the nvidia ones, the ATI i haven’t tested yet.

If you have some more things I could try out please let me know!
This bug/error is way out of my league :slight_smile:

I can try to call the kernel later today, with some “dummy” data to see whether it at least may be called at all or whether it also reports an “invalid kernel”.

So this “dummy data call” worked as well. But I just tried it with the AMD platform, and the compiler issued a warning:


Build log for device 0:
"C:\...\Temp\OCL2A.tmp.cl", line 3: warning: global
          variable declaration is corrected by the compiler to have addrSpace
          constant
      const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP | CLK_FILTER_LINEAR;
                      ^

Changing the “const” to “__constant” made the warning disappear - maybe there’s some obscure reason for the NVIDIA to “fail silently” here…?

Hi Marco

Thanks for the tip!
After plugging in the ATI card I received actual information out of the build logs (the nvidia cards just returned empty strings) and I have resolved all warnings in all of my kernels.

At first the transform kernel which acted and crashed in a strange way still acted and crashed in a strange way…
I have then begun to rewrite that particular kernel in a way it would still do the same.

I now have a more or less stable set of kernels which haven’t crashed at all yet. Thumbs up!

Here are the pieces of code i changed:

Here i basicly changed the way i calculated the coordinates by using the dot function.
I also passed the values of the matrix in 2 float4 instead of a bunch of floats


__kernel void transform(
	__read_only image2d_t floating,
	__write_only image2d_t output,
	const float4 m0,
	const float4 m1
)
{
	int gx = get_global_id(0);
	int gy = get_global_id(1);
	
	int2 coord = {gx,gy};
	float4 coords = {gx, gy, 1, 0};
	
	float xTrans = dot(m0, coords);
	float yTrans = dot(m1, coords);
	
	int xTransi = convert_int(xTrans);
	int yTransi = convert_int(yTrans);
	
	int2 coordTrans = {xTransi,yTransi};

	if(xTransi < 0 || yTransi < 0 || xTransi >= get_image_width(floating) || yTransi >= get_image_height(floating)){
		uint4 pixel;
		pixel.x = 255;
		pixel.y = 255;
		pixel.z = 255;
		write_imageui(output, coord, pixel);
	}
	else{
		uint4 pixel = read_imageui(floating, sampler, coordTrans);
		write_imageui(output, coord, pixel);
	}
	
	
}```

I have no idea why it started working all of a sudden but it did.
Thanks for the tips!

I also have a possible bug I will post in a different thread and another issue i stumbled upon.

Good to hear that it works now… although there remains this unsatisfactory feeling of not knowing what was the reason for the error. I compared both codes side-by-side and did not find anything in the differences that could “obviously” have caused the error… Might it be the unused ‘target’ image in the first version? Maybe, maybe not - I think the only option in this case could be a “binary search”, transforming one kernel stepwise into the other, and see when the error occurs again… -_-

The unused image in the first code was an escalation of different things we tried in the past but for some we never took it away… :slight_smile:

It was something we noticed when trying different things out but removing didn’t do the trick either.
But you are right… It’s not a good feeling not knowing what was causing the strange behavior…