Problem accessing data passed to kernel

I’m passing a char array via pointer to the kernel.

For whatever reason I can’t actually access the data that pointer is pointing to.

__kernel void sampleKernel(__global const char *t)
{

printf("%s",t);

}

This will simply print „1“ every time, even though 30+ chars are in the char array the pointer is pointing to. I allocated the correct amount of memory, it executes fine, etc. I’m sure this is just something simple but I’m not getting it!

Memory allocated with:

cl_mem addressMemA = clCreateBuffer(context,CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, Sizeof.cl_char*address.length, Pointer.to(address), null);

int a = 0;
clSetKernelArg(kernel, a++, Sizeof.cl_mem, Pointer.to(addressMemA));

The spec about printf is quite extensive, but as far as I see, there may be several caveats:

  • You did not show what address is, but it must be a byte[] array that is obtained from the string. (It will not work with the char[] array)
  • The string must be a 0-terminated string, so you need one additional 0-byte at the end
  • The unfortunate part: It might not be valid to call the function like this at all :frowning:

The latter refers to the part of the spec that says the following:

A few examples of valid use cases of printf for the conversion specifier s are given below. The argument value must be a pointer to a literal string.

kernel void my_kernel( ... )
{
    printf("%s\n", "this is a test string\n");
}

A few examples of invalid use cases of printf for the conversion specifier s are given below:

kernel void my_kernel(global char *s, ... )
{
    printf("%s\n", s);
    constant char *p = "`this is a test string\n`";
    printf("%s\n", p);
    printf("%s\n", &p[3]);
}

The important part is that the kernel argument is not a literal string, and it says that something like printf("%s\n", p); was invalid.

I just tried it out, and the following worked for me, but even if it works for you, I guess you shouldn’t rely too much on that:

package org.jocl.tests;

import static org.jocl.CL.*;

import java.util.Arrays;

import org.jocl.*;

public class JOCLPrintStringSample
{
    private static String programSource =
        "__kernel void "+
        "sampleKernel(__global const char *t)"+
        "{"+
        // NOTE: This MIGHT be invalid, because the argument is not a
        // literal string!
        // https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_C.html#printf
        "    printf(\"%s\",t);"+
        "}";
    
    private static cl_context context;

    private static cl_command_queue commandQueue;

    private static cl_kernel kernel;

    /**
     * The entry point of this sample
     * 
     * @param args Not used
     */
    public static void main(String args[])
    {
        defaultInitialization();
        
        String hostString = "An example";
        
        // Obtain the BYTES from the string, and copy them into an array
        // that is one byte longer (thus, creating the 0-terminating byte)
        byte hostData[] =
            Arrays.copyOf(hostString.getBytes(), hostString.length() + 1);

        cl_mem deviceData = 
            clCreateBuffer(context,CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, 
                Sizeof.cl_char*hostData.length, Pointer.to(hostData), null);

        int a = 0;
        clSetKernelArg(kernel, a++, Sizeof.cl_mem, Pointer.to(deviceData));

        int n = 5;
        long global_work_size[] = new long[]{n};
        clEnqueueNDRangeKernel(commandQueue, kernel, 1, null,
            global_work_size, null, 0, null, null);
        
        clReleaseKernel(kernel);
        clReleaseCommandQueue(commandQueue);
        clReleaseContext(context);
    }
    
    private static void defaultInitialization()
    {
        // The platform, device type and device number
        // that will be used
        final int platformIndex = 0;
        final long deviceType = CL_DEVICE_TYPE_ALL;
        final int deviceIndex = 0;

        // Enable exceptions and subsequently omit error checks in this sample
        CL.setExceptionsEnabled(true);

        // Obtain the number of platforms
        int numPlatformsArray[] = new int[1];
        clGetPlatformIDs(0, null, numPlatformsArray);
        int numPlatforms = numPlatformsArray[0];

        // Obtain a platform ID
        cl_platform_id platforms[] = new cl_platform_id[numPlatforms];
        clGetPlatformIDs(platforms.length, platforms, null);
        cl_platform_id platform = platforms[platformIndex];

        // Initialize the context properties
        cl_context_properties contextProperties = new cl_context_properties();
        contextProperties.addProperty(CL_CONTEXT_PLATFORM, platform);
        
        // Obtain the number of devices for the platform
        int numDevicesArray[] = new int[1];
        clGetDeviceIDs(platform, deviceType, 0, null, numDevicesArray);
        int numDevices = numDevicesArray[0];
        
        // Obtain a device ID 
        cl_device_id devices[] = new cl_device_id[numDevices];
        clGetDeviceIDs(platform, deviceType, numDevices, devices, null);
        cl_device_id device = devices[deviceIndex];

        // Create a context for the selected device
        context = clCreateContext(
            contextProperties, 1, new cl_device_id[]{device}, 
            null, null, null);
        
        // Create a command-queue for the selected device
        cl_queue_properties properties = new cl_queue_properties();
        commandQueue = clCreateCommandQueueWithProperties(
            context, device, properties, null);
        
        // Create the program from the source code
        cl_program program = clCreateProgramWithSource(context,
            1, new String[]{ programSource }, null, null);
        
        // Build the program
        clBuildProgram(program, 0, null, null, null, null);
        
        kernel = clCreateKernel(program, "sampleKernel", null);
    }
}

Thank you! Turns out it is byte[] vs char[] issue, as you describe.

The documentation referenced a literal char array and so that’s what I was providing. I saw the note about it being invalid, but it provided no context as to why it was invalid and it was proper C notation, so I was rather confused.

Well, the internals of OpenCL and the GPU are difficult. When using a global char* as the argument, the printf call has to read the memory somehow, whereas with a real "literal string", it could probably be „inlined“ in some way. But these are just guesses. If it works for now, it’s fine, but I wouldn’t rely too much on this working on all OpenCL platforms.