OpenCL on IntelliJ IDEA

I use IntelliJ IDEA as workbench and have taken first steps to use my AMD onboard GPU to do some parallel programming using JOCL. It works but I noticed that the output in the printf instruction in the kernel program does not go to the work bench console as stdout channel. It is not clear, whether that is a problem with IntelliJ IDEA or the JOCL library. Any suggestions? Are their people who work with GPU programming on IntelliJ IDEA?

I have not so actively used IntelliJ until now. In doubt, I could run a test. But as a start: Does the output appear at all? (E.g. when you run the program manually from the console?). It might be that IntelliJ uses some magic internally, to capture the output and pass it forward to its console. It might also not appear „immediately“ due to some buffering that the underlying OpenCL implementation is doing.

Or more generally: There are many possible reasons for the observed behavior, and it might be tricky to figure out what’s wrong there. From my experience, AMD manages the print functionality from kernels better than NVIDIA in some cases, but you have to keep in mind that the kernel is executed in parallel, many, many times. So in any case, it can have a somewhat confusing output.

Hi Marco. That was exactly my thinking too. Either IntelliJ captures stdout or the JOCL framework I am using picks up the output from the GPU somehow. My plan is to find somebody who has seen it working at lest once. That would give me re-assurance that pursuing the problem is worthwile.
To answer your questin: no, I haven’t seen the output anywhere. But I haven’t tried yet to run the program manually from the console.

Just to have a starting point for a test, here is the „Hello World“ (Vector Addition) sample with some additional printf in the kernel:

 * JOCL - Java bindings for OpenCL
 * Copyright 2009-2020 Marco Hutter -
package org.jocl.test;

import static org.jocl.CL.*;

import java.util.Arrays;

import org.jocl.*;

 * A small JOCL sample that uses 'printf' in a kernel
public class JOCLPrintSample
     * The source code of the OpenCL program to execute
    private static String programSource =
        "__kernel void "+
        "sampleKernel(__global const float *a,"+"\n"+
        "             __global const float *b,"+"\n"+
        "             __global float *c)"+"\n"+
        "    printf(\"G0:%d\\n\", get_global_id(0));"+"\n"+
        "    printf(\"L0:%d\\n\", get_local_id(0));"+"\n"+
        "    printf(\"S0:%d\\n\", get_local_size(0));"+"\n"+
        "    int gid = get_global_id(0);"+"\n"+
        "    c[gid] = a[gid] * b[gid];"+"\n"+

     * The entry point of this sample
     * @param args Not used
    public static void main(String args[])
        // Create input- and output data 
        int n = 10;
        float srcArrayA[] = new float[n];
        float srcArrayB[] = new float[n];
        float dstArray[] = new float[n];
        for (int i=0; i<n; i++)
            srcArrayA[i] = i;
            srcArrayB[i] = i;
        Pointer srcA =;
        Pointer srcB =;
        Pointer dst =;

        // The platform, device type and device number
        // that will be used
        final int platformIndex = 0;
        final long deviceType = CL_DEVICE_TYPE_ALL;
        final int deviceIndex = 0;

        // Enable exceptions and subsequently omit error checks in this sample

        // Obtain the number of platforms
        int numPlatformsArray[] = new int[1];
        clGetPlatformIDs(0, null, numPlatformsArray);
        int numPlatforms = numPlatformsArray[0];

        // Obtain a platform ID
        cl_platform_id platforms[] = new cl_platform_id[numPlatforms];
        clGetPlatformIDs(platforms.length, platforms, null);
        cl_platform_id platform = platforms[platformIndex];

        // Initialize the context properties
        cl_context_properties contextProperties = new cl_context_properties();
        contextProperties.addProperty(CL_CONTEXT_PLATFORM, platform);
        // Obtain the number of devices for the platform
        int numDevicesArray[] = new int[1];
        clGetDeviceIDs(platform, deviceType, 0, null, numDevicesArray);
        int numDevices = numDevicesArray[0];
        // Obtain a device ID 
        cl_device_id devices[] = new cl_device_id[numDevices];
        clGetDeviceIDs(platform, deviceType, numDevices, devices, null);
        cl_device_id device = devices[deviceIndex];

        // Create a context for the selected device
        cl_context context = clCreateContext(
            contextProperties, 1, new cl_device_id[]{device}, 
            null, null, null);
        // Create a command-queue for the selected device
        cl_queue_properties properties = new cl_queue_properties();
        cl_command_queue commandQueue = clCreateCommandQueueWithProperties(
            context, device, properties, null);

        // Allocate the memory objects for the input- and output data
        cl_mem srcMemA = clCreateBuffer(context, 
            Sizeof.cl_float * n, srcA, null);
        cl_mem srcMemB = clCreateBuffer(context, 
            Sizeof.cl_float * n, srcB, null);
        cl_mem dstMem = clCreateBuffer(context, 
            Sizeof.cl_float * n, null, null);
        // Create the program from the source code
        cl_program program = clCreateProgramWithSource(context,
            1, new String[]{ programSource }, null, null);
        // Build the program
        clBuildProgram(program, 0, null, null, null, null);
        // Create the kernel
        cl_kernel kernel = clCreateKernel(program, "sampleKernel", null);
        // Set the arguments for the kernel
        int a = 0;
        clSetKernelArg(kernel, a++, Sizeof.cl_mem,;
        clSetKernelArg(kernel, a++, Sizeof.cl_mem,;
        clSetKernelArg(kernel, a++, Sizeof.cl_mem,;
        // Set the work-item dimensions
        long global_work_size[] = new long[]{n};
        // Execute the kernel
        clEnqueueNDRangeKernel(commandQueue, kernel, 1, null,
            global_work_size, null, 0, null, null);
        // Read the output data
        clEnqueueReadBuffer(commandQueue, dstMem, CL_TRUE, 0,
            n * Sizeof.cl_float, dst, 0, null, null);
        // Release kernel, program, and memory objects
        // Verify the result
        boolean passed = true;
        final float epsilon = 1e-7f;
        for (int i=0; i<n; i++)
            float x = dstArray[i];
            float y = srcArrayA[i] * srcArrayB[i];
            boolean epsilonEqual = Math.abs(x - y) <= epsilon * Math.abs(x);
            if (!epsilonEqual)
                passed = false;
        System.out.println("Test "+(passed?"PASSED":"FAILED"));
        if (n <= 10)
            System.out.println("Result: "+Arrays.toString(dstArray));

Can you confirm that this does not print anything on the console in IntelliJ?

And as another test: Putting this (together with the JOCL JAR from ) into a directory and running

javac -cp ".;jocl-2.0.2.jar"
java  -cp ".;jocl-2.0.2.jar"

from the console should print something.

(Note: If you’re on Linux, it’s

javac -cp ".:jocl-2.0.2.jar"
java  -cp ".:jocl-2.0.2.jar"

with : instead of ;).

If this in fact does print something, then it’s somehow related to IntelliJ. (I could then try to allocate some time to try it out in IntelliJ, but could also not do much more than ~„try out possible solutions“…)

If the manual test does not print anything: Which OpenCL implementation are you using? (NVIDIA, AMD, Other…?)

I have to do my homework now with your input. Thanks a LOT for your effort already. I go on a business trip for nearly two weeks now. So, please don’t expect my feedback as quickly as you have looked into this. Totally appreciated!

Hi Marco. The proposed test works. When running the Java compilation and execution from the command interface it works. I got the expected output from the kernel. So, it appears that this is not a problem with the AMD GPU but IntelliJ IDEA can’t handle this. I have also opened an inquiry on the IntelliJ IDEA user form of JetBrains. The supporter told me that they actually see their CLion workbench as the right environment supportung GPU programming. He also suggested to do just the same test as you proposed.
To answer your question: I am using the JOCL library with OpenCL 2.0, so I was familiar with the sample program already.
Should you find some time and ambition to further dig into it, that would be welcome of course. Otherwise thanks a lot for your hints.

Curious to see what the response will be there. The mechanisms behind the OpenCL printf are somewhat complicated. I don’t know all the technical details, of course, but there certainly is some buffering going on.

(I’d try a „naive“ experiment: If the program from above does not print anything in IntelliJ, then I’d try to change the int n = 10; to int n = 10000;, just to see whether there might be some buffer involved that has ~1024 bytes, and simply is not „flushed“ (to appear at the console) - but of course, that would only be out of curiosity, and not solve the underlying issue).

In doubt, I’d try it out myself (maybe also with NVIDIA/AMD implementations, to see whether this makes a difference).

Hi. I think we can narrow it down to the way the printf works. I simplified the program to the bare bone:

__kernel void PrintfTest()
    printf("%s\n", "hello world1");
    printf("hello world2");

It produces on the IntelliJ IDEA console:
hello world1
It appears that printf doesn’t do anything without arguments. The specification is - I think - not quite clear that there has to be at least one argument. However, on page 292 of said spec an example of a valid use of printf is given as follows:

kernel void my_kernel( … )
    printf("%s\n", "this is a test string\n");

That seems to suggest, that you have to formally use a string argument, even if you have just that one argument.

So, I guess: case closed. Thanks again for your effort.

Now, that’s odd. I think I haven’t tried „only“ printing a string, as in the second line of the first snippet, but maybe will do that later. In any case: Even though the behavior is … unexpected, it’s good to know the „reason“ (or rather: the conditions under which it appears), and how to solve it (even though it looks quirky at the first glance).