Jocl and Hadoop

Hi, I am not able to solve an issue regarding jocl.

My program flow is something like this:

loop()
{
initGPU();
makeSomethingOnGPU()
destroyGPU();
}

I MUST put initialization and destroy steps inside the loop. After a number of correct iterations , it crashes giving CL_OUT_OF_RESOURCES error and I am not able to understand why.
I believe I destroy all resources. Following is the code

initGPU

makeSomethingOnGPU

destroyGPU

Can you detail what “a number of … iterations” means? 10? 1000? 100000?

I just assembled the snippets that you provided into this example, and it’s currently passing the 5000 mark as I am writing this…

package test;

import static org.jocl.CL.*;

import org.jocl.*;

public class RepeatedInitAndShutdownTest
{
    private static String programSource =
        "__kernel void "+
        "compute(__global float *a,"+
        "        __global float *b)"+
        "{"+
        "}";
    private static cl_platform_id clPlatform;
    private static cl_device_id clDevice;
    private static cl_context clContext;
    private static cl_command_queue clCommandQueue;
    private static cl_program clProgram;
    private static cl_kernel clKernel;
    

    /**
     * The entry point of this sample
     * 
     * @param args Not used
     */
    public static void main(String args[])
    {
        CL.setExceptionsEnabled(true);
        int count = 0;
        while (true)
        {
            initGPU();
            makeSomethingOnGPU();
            destroyGPU();
            System.out.println("Run "+count+" done");
            count++;
        }
    }
    
    
    
    private static void initGPU()
    {
        //Obtain the number of platforms
        int numPlatformsArray[] = new int[1];
        clGetPlatformIDs(0, null, numPlatformsArray);
        int numPlatforms = numPlatformsArray[0];

        // Obtain a clPlatform ID
        cl_platform_id platforms[] = new cl_platform_id[numPlatforms];
        clGetPlatformIDs(platforms.length, platforms, null);
        clPlatform = platforms[1];

        // Initialize the context properties
        cl_context_properties contextProperties = new cl_context_properties();
        contextProperties.addProperty(CL_CONTEXT_PLATFORM, clPlatform);

        // Obtain the number of devices for the clPlatform
        int numDevicesArray[] = new int[1];
        clGetDeviceIDs(clPlatform, CL_DEVICE_TYPE_GPU, 0, null, numDevicesArray);
        int numDevices = numDevicesArray[0];

        // Obtain a clDevice ID
        cl_device_id devices[] = new cl_device_id[numDevices];
        clGetDeviceIDs(clPlatform, CL_DEVICE_TYPE_GPU, numDevices, devices, null);
        clDevice = devices[0];

        // Create a context for the selected clDevice
        clContext = clCreateContext(contextProperties, 1, 
            new cl_device_id[]{clDevice}, null, null, null);

        // Create a command-queue for the selected clDevice
        clCommandQueue = clCreateCommandQueue(clContext, clDevice, 0, null);
    }
    
    static void makeSomethingOnGPU()
    {
        int n = 1000;
        float dummy[] = new float[n];
        cl_mem clustersInMem = clCreateBuffer(clContext,
                        CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR,
                        n * Sizeof.cl_float, Pointer.to(dummy), null);

        cl_mem clusterOutMem = clCreateBuffer(clContext,
                        CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR,
                        n * Sizeof.cl_float, Pointer.to(dummy), null);

        clProgram = clCreateProgramWithSource(
            clContext, 1, new String[]{ programSource  }, null, null);

        clBuildProgram(clProgram, 0, null, null, null, null);

        clKernel = clCreateKernel(clProgram, "compute", null);
        clSetKernelArg(clKernel, 0, Sizeof.cl_mem, Pointer.to(clustersInMem));
        clSetKernelArg(clKernel, 1, Sizeof.cl_mem, Pointer.to(clusterOutMem));
        long global_work_size[] = new long[]{n};
        clEnqueueNDRangeKernel(clCommandQueue, clKernel, 1, null,
            global_work_size, null, 0, null, null);
        clReleaseMemObject(clustersInMem);
        clReleaseMemObject(clusterOutMem);
        
    }
    
    static void destroyGPU()
    {
        if (clKernel != null)
            clReleaseKernel(clKernel);
        if (clProgram != null)
            clReleaseProgram(clProgram);
        if (clCommandQueue != null)
            clReleaseCommandQueue(clCommandQueue);
        if (clContext != null)
            clReleaseContext(clContext);
    }
}

Can you confirm that this works in your case as well?

Sorry, I forgot to mention that I am using Hadoop MapReduce.
What is inside loop is a map routine computed for many time. the loop is the application.

loop{
call Mapper();
call Reducer();
wait_until_finish();
test();
}

Mapper()
{
initGPU();
makeSomethingOnGPU()
destroyGPU();
}

It works well when iteration are until 30 (after starts to go slowly) but if I set as you did my mapreduce job fails after 60 iterations.

Ps Consider that I clean memory using also System.gc(), so I havent any memory problems;
Ps2 the kernels do not have any problems beacuse I have already tested.

You mentioned Hadoop in the title :wink:

But sorry, at the moment I have no idea what might cause this error. One reason is that I’m not really familiar with Hadoop (I once tried to use it, but … well -_- ). So I don’t know the details about how this application is actually ran (on one machine, or multiple machines?), or how one could try to find out what causes this error.

The only advice I can give is the - sorry, somewhat tedious - approach to narrow down the space where the error may appear.

  • Can the „RepeatedInitAndShutdownTest“ be run on the machine where the reducer is run? (Just to see whether it’s a general problem with the OpenCL implementation, or whether it’s really related to the Hadoop nature of the application)
  • Does the Reducer involve OpenCL?
  • What happens if you only call „initGPU“ and „destroyGPU“, but omit the „makeSomethingOnGPU“ call?