JOCLSample on Linux


#1

Hi,
I’ve been trying and playing around with the provided JOCL Sample and it works.
However, as soon as I change anything (for example change the multiplication into a addition, or adding extra white space, or … anything!) to the kernel source the kernel doesn’t build anymore. I bet others didn’t had this failure yet so I question how could this happen? Here is a dump if what I get in my console:

A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0xb75aa1a6, pid=7810, tid=3059977024

JRE version: 7.0_25-b30

Java VM: OpenJDK Server VM (23.7-b01 mixed mode linux-x86 )

Problematic frame:

C [libc.so.6+0x851a6] envz_strip+0x1d6

Failed to write core dump. Core dumps have been disabled. To enable core dumping, try “ulimit -c unlimited” before starting Java again

An error report file with more information is saved as:

/home/geoffrey/JOCLSample/hs_err_pid7810.log

If you would like to submit a bug report, please include

instructions on how to reproduce the bug and visit:

https://bugs.launchpad.net/ubuntu/+source/openjdk-7/

The crash happened outside the Java Virtual Machine in native code.

See problematic frame for where to report the bug.

Java Result: 134
BUILD SUCCESSFUL (total time: 0 seconds)

and here is my kernel source that doesn’t build (note the extra white space before b[gid]:

        "__kernel void "+
        "sampleKernel(__global const float *a,"+
        "             __global const float *b,"+
        "             __global float *c)"+
        "{"+
        "    int gid = get_global_id(0);"+
        "    c[gid] = a[gid] *  b[gid];"+
        "}";```

#2

Hello

It’s strange that it works for one kernel but seems to occur when the kernel itself is modified.

At the moment, I can only guess that it might be the same issue that is described in http://code.google.com/p/javacl/wiki/TroubleShootingJavaCLOnLinux , where the recommendation is to set the LD_PRELOAD environment variable to
export LD_PRELOAD=/usr/lib/jvm/java-6-openjdk/jre/lib/amd64/libjsig.so
(with the appropriate path for your JDK)

Please let me know whether this helps. Otherwise, the contents of the “hs_err_*”-file might bring some insights (if you post it, check whether you might want to omit some of the details at the end of the file), but it may be hard to track down the reason for a general segfault…

bye


#3

Hi,

on my machine I had to:

export LD_PRELOAD=/usr/lib/jvm/java-7-openjdk-i386/jre/lib/i386/libjsig.so

but that didn’t work. Below is the content of the error log file.

EDIT: when I copy my program to my windows partition and run it from within Windows 8 there seems to be no problem however, so it’s really specific to the Linux platform.


#4

OK, according to the log file, it crashes when it tries to obtain the build logs. However, it should only try to obtain the build logs when it fails to compile the program (because then it throws an exception containing the build log, so that you can directly see what was wrong).

Unfortunately, my possibilities for running tests on Linux are rather limited.

So sorry, I have to ask: Are you absolutely sure that the modified program is valid, i.e. that it will not cause compile errors? (You mentioned that you just added ‘space’ or changed the ‘*’ to ‘+’, so this should be the case - I’m just asking :wink: )

But even if it fails to compile, it should, of course, be able to obtain the build logs. I just had another short look at the relevant source code, but did not see any obvious errors (and never heard about any problems with this particular function), but I’ll have a closer look at this ASAP.

Until then, could you please run another test: Could you comment OUT the line
[inline]CL.setExceptionsEnabled(true);[/inline]
and change the call to the [inline]clBuildProgram[/inline] function to

// Build the program
int errorCode = clBuildProgram(program, 0, null, null, null, null);
System.out.println("errorCode "+errorCode);
if (errorCode != CL.CL_SUCCESS) System.exit(0);

to see whether there actually IS a problem with the compilation?

Sorry for the inconveniences.


#5

Hey,
about the changes: yes I’m sure: I copied the JOCLSample and build it in Netbeans IDE and the default code works. But adding just an extra white space somewhere in the OpenCL code creates the error. Running the EXACT SAME code (with extra white space) on Windows 8 however doesn’t create the error, i.e. it works the way it should be.

what I tried before:

  • CL.setExceptionsEnabled(true);
    commenting this line alone doesn’t change anything, same compile error on Linux.

  • clBuildProgram and CL.setExceptionsEnabled(true);
    with exceptions commented out and adding the 3 lines of code you mentioned I do not get the error dump as seen in my first post, nether is there any error file that gets created. The program does exit however printing the OpenCl error code -11, which is the CL_BUILD_PROGRAM_FAILURE code.

The funny thing is that when I do this:

        clBuildProgram(program, 0, null, null, null, null);
        
        int errorCode = clBuildProgram(program, 0, null, null, null, null);
        System.out.println("errorCode "+errorCode);
        if (errorCode != CL.CL_SUCCESS) System.exit(0);```
the error dump file from before does not appear either even though the default clBuildProgram from the original sample is still there.

#6

So that’s strange. When
CL.setExceptionsEnabled(true);
is NOT called, then it should never try to obtain any build logs. But if I unterstood you correctly, it still crashes when you comment OUT this line, right? Could you post the hs_err-File that is created in this case?

At least the error code CL_BUILD_PROGRAM_FAILURE was what I expected (although I don’t know why it happens). Unfortunately, one can not find out why it fails as long as it is not possible to obtain the build logs.

To summarize this: When

  • Exceptions are enabled by calling CL.setExceptionsEnabled(true);
  • AND it fails to compile the program
    THEN it tries to obtain the build log. And there, something goes wrong. I can only guess why it fails to obtain the build logs. (For example: IF the OpenCL implementation reported a wrong size for the build log output, this could explain the segmentation fault).

So, as a first analysis step, I have created a small test that prints some information about the process of obtaining the build log for the program. Can you reproduce the crash there? (Possibly by modifiying the source code?) If yes, what does it print before the crash, and what is the contents of the hs_err-File?

(And, BTW: What OpenCL implementation are you using? Intel/AMD/NVIDIA…?)

import static org.jocl.CL.*;
import org.jocl.*;

import java.util.Arrays;


public class JOCLCompileTest
{
    private static String programSource =
        "__kernel void "+
        "sampleKernel(__global const float *a,"+
        "             __global const float *b,"+
        "             __global float *c)"+
        "{"+
        "    int gid = get_global_id(0);"+
        "    c[gid] = a[gid] * b[gid];"+
        "}";
    
    public static void main(String args[])
    {
        // The platform, device type and device number
        // that will be used
        final int platformIndex = 0;
        final long deviceType = CL_DEVICE_TYPE_ALL;
        final int deviceIndex = 0;

        // Enable exceptions and subsequently omit error checks in this sample
        //CL.setExceptionsEnabled(true);

        // Obtain the number of platforms
        int numPlatformsArray[] = new int[1];
        clGetPlatformIDs(0, null, numPlatformsArray);
        int numPlatforms = numPlatformsArray[0];

        // Obtain a platform ID
        cl_platform_id platforms[] = new cl_platform_id[numPlatforms];
        clGetPlatformIDs(platforms.length, platforms, null);
        cl_platform_id platform = platforms[platformIndex];

        // Initialize the context properties
        cl_context_properties contextProperties = new cl_context_properties();
        contextProperties.addProperty(CL_CONTEXT_PLATFORM, platform);
        
        // Obtain the number of devices for the platform
        int numDevicesArray[] = new int[1];
        clGetDeviceIDs(platform, deviceType, 0, null, numDevicesArray);
        int numDevices = numDevicesArray[0];
        
        // Obtain a device ID 
        cl_device_id devices[] = new cl_device_id[numDevices];
        clGetDeviceIDs(platform, deviceType, numDevices, devices, null);
        cl_device_id device = devices[deviceIndex];

        // Create a context for the selected device
        cl_context context = clCreateContext(
            contextProperties, 1, new cl_device_id[]{device}, 
            null, null, null);
        
        // Create the program from the source code
        cl_program program = clCreateProgramWithSource(context,
            1, new String[]{ programSource }, null, null);
        
        // Build the program
        clBuildProgram(program, 0, null, null, null, null);
        
        printProgramInfo(program);
        
        // Create the kernel
        cl_kernel kernel = clCreateKernel(program, "sampleKernel", null);
        
        // Release kernel, program, and memory objects
        clReleaseKernel(kernel);
        clReleaseProgram(program);
        clReleaseContext(context);
        
        System.out.println("Done");
    }
    
    private static void printProgramInfo(cl_program program)
    {
        System.out.println("Program info:");
        int numDevices[] = new int[1];
        CL.clGetProgramInfo(program, CL.CL_PROGRAM_NUM_DEVICES, Sizeof.cl_uint, Pointer.to(numDevices), null);
        System.out.println("numDevices "+numDevices[0]);
        
        cl_device_id devices[] = new cl_device_id[numDevices[0]];
        CL.clGetProgramInfo(program, CL.CL_PROGRAM_DEVICES, numDevices[0] * Sizeof.cl_device_id, Pointer.to(devices), null);
        System.out.println("devices "+Arrays.toString(devices));

        for (int i=0; i<devices.length; i++)
        {
            System.out.println("Build log info for device "+i);
            long logSize[] = new long[1];
            CL.clGetProgramBuildInfo(program, devices[i], CL.CL_PROGRAM_BUILD_LOG, 0, null, logSize);
            System.out.println("logSize "+logSize[0]);
            
            byte logData[] = new byte[(int)logSize[0]];
            CL.clGetProgramBuildInfo(program, devices[i], CL.CL_PROGRAM_BUILD_LOG, logSize[0], Pointer.to(logData), null);
            System.out.println("Obtained log data:");
            System.out.println(">"+new String(logData, 0, logData.length-1)+"<");
        }
    }
}

#7

It creates the log file when I have this code:

...
clBuildProgram(program, 0, null, null, null, null);```

It does however not create the log file when I have this code:

```//CL.setExceptionsEnabled(true);
...
clBuildProgram(program, 0, null, null, null, null);
int errorCode = clBuildProgram(program, 0, null, null, null, null);
System.out.println("errorCode "+errorCode);
if (errorCode != CL.CL_SUCCESS) System.exit(0);```

In the last case console is returning -11 (CL_BUILD_PROGRAM_FAILURE)

I'm running the NVIDIA implementation. When I try to run this program on another Ubuntu machine with AMD implementation: it works.

Now onto the code you provided. It runs perfectly with no modifications and returns:
```Program info:
numDevices 1
devices [cl_device_id[0x69aba3f0]]
Build log info for device 0
logSize 2
Obtained log data:
>
<
Done

When I change the kernel, the error happens again and I get following output:

numDevices 1
devices [cl_device_id[0x699ba3f0]]
Build log info for device 0
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xb75c21a6, pid=5420, tid=3060075328
#
# JRE version: 7.0_25-b30
...```


When I change the code to:
```logSize = new long[1];
            CL.clGetProgramBuildInfo(program, devices[i], CL.CL_PROGRAM_BUILD_STATUS, 0, null, logSize);
            System.out.println("build status: "+logSize[0] +"");
            logData = new byte[(int)logSize[0]];
            CL.clGetProgramBuildInfo(program, devices[i], CL.CL_PROGRAM_BUILD_STATUS, logSize[0], Pointer.to(logData), null);
            System.out.println("Obtained status data:");
            System.out.println(">"+new String(logData, 0, logData.length-1)+"<");

            
            logSize = new long[1];
            CL.clGetProgramBuildInfo(program, devices[i], CL.CL_PROGRAM_BUILD_OPTIONS, 0, null, logSize);
            System.out.println("build options: "+logSize[0] +"");
            logData = new byte[(int)logSize[0]];
            CL.clGetProgramBuildInfo(program, devices[i], CL.CL_PROGRAM_BUILD_OPTIONS, logSize[0], Pointer.to(logData), null);
            System.out.println("Obtained buiild options data:");
            System.out.println(">"+new String(logData, 0, logData.length-1)+"<");

            logSize = new long[1];
            CL.clGetProgramBuildInfo(program, devices[i], CL.CL_PROGRAM_BUILD_LOG, 0, null, logSize);
//            System.out.println("logSize "+logSize[0]);
//            
//            byte logData[] = new byte[(int)logSize[0]];
//            CL.clGetProgramBuildInfo(program, devices[i], CL.CL_PROGRAM_BUILD_LOG, logSize[0], Pointer.to(logData), null);
//            System.out.println("Obtained log data:");
//            System.out.println(">"+new String(logData, 0, logData.length-1)+"<");```

The program errors on the line where it asks for CL_PROGRAM_BUILD_LOG. The same dump as before, here is the output:

```Program info:
numDevices 1
devices [cl_device_id[0x69ab2458]]
Build log info for device 0
build status: 4
Obtained status data:
>???<
build options: 1
Obtained buiild options data:
><
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xb76401a6, pid=6892, tid=3060591424
#
# JRE version: 7.0_25-b30
# Java VM: OpenJDK Server VM (23.7-b01 mixed mode linux-x86 )
# Problematic frame:
# C  [libc.so.6+0x851a6]  envz_strip+0x1d6
#```

The error log file is in the attachment

#8

[QUOTE=geoffrey;29600]It creates the log file when I have this code:

...
clBuildProgram(program, 0, null, null, null, null);```
[/quote]

In all hs_err-Files until now, the crash occurred during [inline]clGetProgramBuildInfo[/inline]. But this method should not be called in the above case. So it would be interesting to see what the hs_err looks like in this case.

BTW: The relevant part (for me, until now) of the hs_err-file is the Stack, e.g.

Stack: [0xb667e000,0xb66cf000], sp=0xb66cde84, free space=319k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [libc.so.6+0x851a6] envz_strip+0x1d6
C [libOpenCL.so+0x3c9a] clGetProgramBuildInfo+0x3a
j org.jocl.CL.clGetProgramBuildInfoNative(Lorg/jocl/cl_program;Lorg/jocl/cl_device_id;IJLorg/jocl/Pointer;[J)I+0
j org.jocl.CL.clGetProgramBuildInfo(Lorg/jocl/cl_program;Lorg/jocl/cl_device_id;IJLorg/jocl/Pointer;[J)I+8
j original.JOCLCompileTest.printProgramInfo(Lorg/jocl/cl_program;)V+430
j original.JOCLCompileTest.main([Ljava/lang/String;)V+174
v ~StubRoutines::call_stub
V [libjvm.so+0x42a9fc] JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x39c








> 
> It does however not create the log file when I have this code:
> ....
> In the last case console is returning -11 (CL_BUILD_PROGRAM_FAILURE)


OK, that is what I expected, still not knowing why it fails to compile.


> 
> I'm running the NVIDIA implementation. When I try to run this program on another Ubuntu machine with AMD implementation: it works.
> 
> Now onto the code you provided. It runs perfectly with no modifications and returns:
> ...
> When I change the code to:
> ...
> The program errors on the line where it asks for CL_PROGRAM_BUILD_LOG.



The fact that it works for a different OpenCL implementation is at least a slight indication that the actual error is not inside JOCL. The fact that it is able to obtain other clGetProgramBuildInfo-elements, but fails ONLY when trying to obtain the CL_PROGRAM_BUILD_LOG is at least an indication that it is not a "general" problem. Nevertheless, the fact that it [i]should[/i] work does not help you much ;) 

However, I'm running out of ideas of what might be the actual reason for the error, and more importantly, how to fix it :confused: A websearch did not bring any helpful results. The most similar problem description that I found was that: https://devtalk.nvidia.com/default/topic/493742/clbuildprogram-always-returns-11/ He also mentions the error code -11 when trying to build the (valid!) program, and, maybe even more importantly, states that

> 
> clGetProgramBuildInfo is deactivated since it just segfaults when calling.


However, this thread is 2 years old, and one should assume that something like this should be fixed by now.

So at the moment, I can only give the standard advice: First, you might want to check whether there is a newer driver version (you seem to use 304, the newest should be ~319). Then you might want to check whether the same program works in its native version. Particularly, whether you can compile a modified kernel in https://developer.nvidia.com/opencl#oclVectorAdd , and possibly, whether you can obtain the build logs in this sample.

Sorry that I can't give more specific hints right now :o