You are using %d
as the format for the printf
call. But %d
is for integer values. For floating point values, it has to be %f
. Also, you’re trying to print four numbers, while there are only three.
So the line should be
printf("Components: [%f|%f|%f]", vec[gid].x, vec[gid].y, vec[gid].z);
(And you should remove this printf("Vec: %d\n", vec[gid]);
, because this is wrong for the same reason, and you cannot print a full vector like that…)
I assume that you may have spent some time with „trial and error“ because you received unexpected output, and this might be the reason why you got some pointer-indirections in the host code wrong: The pointer that is passed to clCreateBuffer
should be the elementPointer
directly (and not a pointer to that pointer). Of course, the size for the memory copy then has to be updated accordingly: The size should be Sizeof.cl_float * numberOfFloatValues
Quickly combined into an example (just quickly written down to show the main points) :
__kernel void raycast(__global const float3* vec) {
const int gid = get_global_id(0);
printf("Global index %d, components: [%f|%f|%f]\n",
gid, vec[gid].x, vec[gid].y, vec[gid].z);
package org.jocl.test;
import static org.jocl.CL.*;
import org.jocl.*;
import java.nio.file.Files;
import java.nio.file.Paths;
public class JOCLRaycastTest
private static final int platformIndex = 0;
private static final long deviceType = CL_DEVICE_TYPE_ALL;
private static final int deviceIndex = 0;
private static cl_context context;
private static cl_command_queue commandQueue;
private static cl_kernel kernel;
public static void main(String args[])
// Default initialization
int numVectors = 5;
// This is 4 even for 3D vectors, because cl_float3=cl_float4!
int numComponentsPerVector = 4;
// Create the array and fill it with example data
float[] array = new float[numVectors * numComponentsPerVector];
for (int i = 0; i < array.length; i++)
array[i] = i;
int totalSizeOfData = Sizeof.cl_float * array.length;
// Create the memory object for the array
Pointer elementPointer =;
cl_mem mem = clCreateBuffer(context,
totalSizeOfData, elementPointer, null);
clSetKernelArg(kernel, 0, Sizeof.cl_mem,;
// Set the work-item dimensions
long global_work_size[] = new long[]{ numVectors };
long local_work_size[] = new long[]{ numVectors };
// Execute the kernel
clEnqueueNDRangeKernel(commandQueue, kernel, 1, null,
global_work_size, local_work_size, 0, null, null);
private static void defaultInitialization()
String programSource = readFile("");
// Obtain the number of platforms
int numPlatformsArray[] = new int[1];
clGetPlatformIDs(0, null, numPlatformsArray);
int numPlatforms = numPlatformsArray[0];
// Obtain a platform ID
cl_platform_id platforms[] = new cl_platform_id[numPlatforms];
clGetPlatformIDs(platforms.length, platforms, null);
cl_platform_id platform = platforms[platformIndex];
// Initialize the context properties
cl_context_properties contextProperties = new cl_context_properties();
contextProperties.addProperty(CL_CONTEXT_PLATFORM, platform);
// Obtain the number of devices for the platform
int numDevicesArray[] = new int[1];
clGetDeviceIDs(platform, deviceType, 0, null, numDevicesArray);
int numDevices = numDevicesArray[0];
// Obtain a device ID
cl_device_id devices[] = new cl_device_id[numDevices];
clGetDeviceIDs(platform, deviceType, numDevices, devices, null);
cl_device_id device = devices[deviceIndex];
// Create a context for the selected device
context = clCreateContext(
contextProperties, 1, new cl_device_id[]{device},
null, null, null);
// Create a command-queue for the selected device
cl_queue_properties properties = new cl_queue_properties();
commandQueue = clCreateCommandQueueWithProperties(
context, device, properties, null);
// Create the program from the source code
cl_program program = clCreateProgramWithSource(context,
1, new String[]{ programSource }, null, null);
// Build the program
clBuildProgram(program, 0, null, null, null, null);
// Create the kernel
kernel = clCreateKernel(program, "raycast", null);
private static String readFile(String fileName)
return new String(Files.readAllBytes(Paths.get(fileName)));
catch (IOException e)
return null;
The output should be
Global index 0, components: [0.000000|1.000000|2.000000]
Global index 1, components: [4.000000|5.000000|6.000000]
Global index 2, components: [8.000000|9.000000|10.000000]
Global index 3, components: [12.000000|13.000000|14.000000]
Global index 4, components: [16.000000|17.000000|18.000000]
You might notice that some values appear to be missing in the output. It goes 0,1,2, 4,5,6 ...
. As you mentioned, this is because cl_float3
is the same as cl_float4
in OpenCL. This means that the array does not contain numVectors*3
, but numVectors*4
floating point values (and some of them are simply not used).
It may be a matter of taste what someone finds more confusing:
- having to deal with 4xn-sized arrays, even though it says
- using
even though one only needs cl_float3
in the kernel
If you want to, you could also simply replace all float3
's with float4
s (also on the host side), and write the kernel as
__kernel void raycast(__global const float4* vec) {
const int gid = get_global_id(0);
printf("Global index %d, components: [%f|%f|%f|%f]\n",
gid, vec[gid].x, vec[gid].y, vec[gid].z, vec[gid].w);
which would print
Global index 0, components: [0.000000|1.000000|2.000000|3.000000]
Global index 1, components: [4.000000|5.000000|6.000000|7.000000]
Global index 2, components: [8.000000|9.000000|10.000000|11.000000]
Global index 3, components: [12.000000|13.000000|14.000000|15.000000]
Global index 4, components: [16.000000|17.000000|18.000000|19.000000]