SVM Pointers inside Struct (alpha version) or some other workaround?

rbrodt · 25. April 2019 um 19:37

I’ve been using the Struct (alpha) jar and it’s come in very handy! After looking at the code, I don’t see any way of embedding Pointer fields in a Struct. What I’m trying to do is something like this:

public class MyStruct extends Struct {
    public float f;
    public Pointer p;
}
...
MyStruct ms = new MyStruct();
String hello = "hello world";
long size = 12;
Pointer svm = clSVMAlloc(context, CL_MEM_READ_WRITE, size, 0);
ByteBuffer bb = svm.getByteBuffer();
bb.put( hello.getBytes(StandardCharsets.US_ASCII), 0, hello.length() );

ms.f = 3.14159;
ms.p = svm; // I guess what I want here is the native pointer as a "long"?

Is there a workaround?

Marco13 · 25. April 2019 um 21:58

A small note: The structs JAR has been created a looong time ago, and the disclaimer at the download site should be clear: It has not been tested so extensively in “real world” applications (although I know that some people are using it). And another note: I recently moved the structs code to a GitHub repo, with the intention to make it a bit more stable and offer proper releases (maybe even in Maven Central), but it’s still a private repository (last but not least due to the caveats that structs have…)

There is no “direct” workarond for the goal that you mentioned. Pointers (and their size in a struct) are inherently platform-specific, so one has to be extra careful here. I’ll try to have a closer look at this. There might be workarounds right now that are a bit crude, but if this is a real, sensible use-case, I’ll try to find a solution (suggestions are always welcome).

rbrodt · 25. April 2019 um 22:34

Thanks for the quick reply Marco!

I have another question about clSVMalloc() which is driving me nuts. Here’s what I’m doing:

    long size = 1024;
    Pointer svm = clSVMAlloc(context, CL_MEM_READ_WRITE, size, 0);
    clEnqueueSVMMap(commandQueue, true, CL_MAP_WRITE, svm, size, 0, null, null);
   ByteBuffer  bb = svm.getByteBuffer(0, size);
    byte[] bytes = new byte[size];
    // fill byte array here (code not shown)...
    // and then write to SVM
    bb.put(bytes);

The ByteBuffer#put() throws an UnsupportedOperationException because there is no backing buffer (so, bb.hb == null). Isn’t that what clEnqueueSVMMap() is supposed to do?

Thanks again!

Marco13 · 26. April 2019 um 16:25

The code that you wrote there should work, indeed.

I have inserted it into an MCVE here:

package org.jocl.test;

import static org.jocl.CL.CL_CONTEXT_PLATFORM;
import static org.jocl.CL.CL_DEVICE_NAME;
import static org.jocl.CL.CL_DEVICE_TYPE_ALL;
import static org.jocl.CL.CL_DEVICE_VERSION;
import static org.jocl.CL.CL_MAP_WRITE;
import static org.jocl.CL.CL_MEM_READ_WRITE;
import static org.jocl.CL.clCreateCommandQueueWithProperties;
import static org.jocl.CL.clCreateContext;
import static org.jocl.CL.clEnqueueSVMMap;
import static org.jocl.CL.clFinish;
import static org.jocl.CL.clGetDeviceIDs;
import static org.jocl.CL.clGetDeviceInfo;
import static org.jocl.CL.clGetPlatformIDs;
import static org.jocl.CL.clSVMAlloc;

import java.nio.ByteBuffer;

import org.jocl.CL;
import org.jocl.Pointer;
import org.jocl.cl_command_queue;
import org.jocl.cl_context;
import org.jocl.cl_context_properties;
import org.jocl.cl_device_id;
import org.jocl.cl_platform_id;
import org.jocl.cl_queue_properties;

public class JOCLSVMAllocTest
{
    private static cl_context context;
    private static cl_device_id device;
    private static cl_command_queue commandQueue;
    
    public static void main(String[] args)
    {
        initCL();
        
        long size = 1024;
        Pointer svm = clSVMAlloc(context, CL_MEM_READ_WRITE, size, 0);
        clEnqueueSVMMap(
            commandQueue, true, CL_MAP_WRITE, svm, size, 0, null, null);
        ByteBuffer bb = svm.getByteBuffer(0, size);

        byte[] bytes = new byte[(int)size];
        bb.put(bytes);
        
        
        clFinish(commandQueue);

        System.out.println("Done");
    }
    
    /**
     * Default OpenCL initialization of the devices, context,
     * command queue, program and kernel.
     */
    private static void initCL()
    {
        // The platform and device type that will be used
        final int platformIndex = 0;
        final long deviceType = CL_DEVICE_TYPE_ALL;

        // Enable exceptions and subsequently omit error checks in this sample
        CL.setExceptionsEnabled(true);

        // Obtain the number of platforms
        int numPlatformsArray[] = new int[1];
        clGetPlatformIDs(0, null, numPlatformsArray);
        int numPlatforms = numPlatformsArray[0];
        
        // Obtain a platform ID
        cl_platform_id platforms[] = new cl_platform_id[numPlatforms];
        clGetPlatformIDs(platforms.length, platforms, null);
        cl_platform_id platform = platforms[platformIndex];
        
        // Initialize the context properties
        cl_context_properties contextProperties = new cl_context_properties();
        contextProperties.addProperty(CL_CONTEXT_PLATFORM, platform);
        
        // Obtain the number of devices for the platform
        int numDevicesArray[] = new int[1];
        clGetDeviceIDs(platform, deviceType, 0, null, numDevicesArray);
        int numDevices = numDevicesArray[0];
        
        // Obtain the all device IDs 
        cl_device_id allDevices[] = new cl_device_id[numDevices];
        clGetDeviceIDs(platform, deviceType, numDevices, allDevices, null);


        // Find the first device that supports OpenCL 2.0
        for (cl_device_id currentDevice : allDevices)
        {
            String deviceName = getString(currentDevice, CL_DEVICE_NAME);
            float version = getOpenCLVersion(currentDevice);
            if (version >= 2.0)
            {
                System.out.println("Using device "+
                    deviceName+", version "+version);
                device = currentDevice;
                break;
            }
            else
            {
                System.out.println("Skipping device "+
                    deviceName+", version "+version);
            }
        }
        if (device == null)
        {
            System.out.println("No OpenCL 2.0 capable device found");
            System.exit(1);
        }
        
        // Create a context 
        context = clCreateContext(
            contextProperties, 1, new cl_device_id[]{ device }, 
            null, null, null);
        
        // Create the command queue
        cl_queue_properties properties = new cl_queue_properties();
        commandQueue = clCreateCommandQueueWithProperties(
            context, device, properties, null);
    }
    
    /**
     * Returns the OpenCL version of the given device, as a float
     * value
     * 
     * @param device The device
     * @return The OpenCL version
     */
    private static float getOpenCLVersion(cl_device_id device)
    {
        String deviceVersion = getString(device, CL_DEVICE_VERSION);
        String versionString = deviceVersion.substring(7, 10);
        float version = Float.parseFloat(versionString);
        return version;
    }
    
    /**
     * Returns the value of the device info parameter with the given name
     *  
     * @param device The device
     * @param paramName The parameter name
     * @return The value
     */
    private static String getString(cl_device_id device, int paramName)
    {
        // Obtain the length of the string that will be queried
        long size[] = new long[1];
        clGetDeviceInfo(device, paramName, 0, null, size);

        // Create a buffer of the appropriate size and fill it with the info
        byte buffer[] = new byte[(int)size[0]];
        clGetDeviceInfo(device, paramName, buffer.length, Pointer.to(buffer), null);

        // Create a string from the buffer (excluding the trailing \0 byte)
        return new String(buffer, 0, buffer.length-1);
    }
    
    
}

(The initCL method is basically taken from the JOCLSample_2_0_SVM.java file at http://www.jocl.org/samples/samples.html - I still have to move the samples from this site into a dedicated repo, so that people can more quickly and easily get started here…)

Does this work for you?

Maybe you are accidentally re-assigning the bb with a new byte buffer in the lines that you didn’t show. But even then, I cannot imagine what should cause an UnsupportedOperationException here. Can you provide the stack trace of that?

rbrodt · 26. April 2019 um 23:04

Sorry for the confusion…the bb.put() actually does work; it’s the bb.array() call that fails in some other part of the code. I was hoping to be able to access the SVM memory on the host (Java) side as a byte array but apparently ByteBuffer#array() detects that the ByteBuffer.hb backing buffer is null and throws the exception.

Am I correct in assuming that I can not get direct read access to SVM on the host side and the only way to do that is by way of ByteBuffer#get(byte[] dst, int offset, int length)? Doesn’t that mean I have to allocate a byte array (“dst”) the size of the SVM and then COPY the bytes from SVM? That seems to defeat the whole purpose of SVM. What am I missing here?

Again, thanks for your help!

Marco13 · 27. April 2019 um 02:26

If I understood this correctly, then I have to say: Sorry, there is no way of letting natively allocated memory (SVM in this case) simply “appear” as a plain Java array. Java arrays are solely managed by the Virtual Machine.

The fact that it is (at least) possible to create a ByteBuffer that is backed by native memory is the largest hole that can currently be drilled into this concept.

However, the point that you “can not get direct read access to SVM on the host side” is not entirely correct: You can work on the ByteBuffer. After calling ByteBuffer bb = svm.getByteBuffer(0, size);, you can read and write this buffer directly on the host/Java side, by calling bb.get(i) and bb.put(i,b). You can also convert it into a FloatBuffer, for example, via FloatBuffer fb = bb.asFloatBuffer() and handle floats conveniently.

Of course, this finds its limitation when you have, for example, a third-party library with some method that expects the input data as a float[] array. Then copying the data is the only option, and then one should indeed do some benchmarking to see whether there is still an advantage of SVM over plain copies.

Admittedly, I haven’t yet done extensive benchmarks for different use-cases here. This is somewhere on my TODO list in the context of further experiments with the JOCLStruct library. I also thought about things like letting Struct be an interface and maybe do some magic with Dynamic Proxy Classes, so that one could write code roughly along the lines of this:

ByteBuffer bb = svm.getByteBuffer(0, size);
List<VertexStruct> vertices = Magic.create(bb, VertexStruct.class);
vertices.get(i).setPosition(1,2,3); // Writes into SVM

but I cannot say for sure when I’ll be able to invest more time here.

rbrodt · 27. April 2019 um 13:45

Thanks for the detailed explanation Marco. This makes perfect sense and for the project I’m working on, it simply means replacing direct byte array accesses with getters and setters - probably a better idea to begin with, although this will most likely incur a performance hit.

Please let me know when/if you decide to create a public github repo for the Struct API - I would be interested in contributing

Marco13 · 27. April 2019 um 16:29

Considering the general approach, from a high-level “API design” perspective, in some cases, it certainly makes sense to use a *Buffer instead of the native array. Even if one has a raw byte[] array at hand, and a method expects a ByteBuffer, one can trivially pass the array via method(ByteBuffer.wrap(array)). Beyond that, the buffers can bring a nice flexibility regarding the “slicing and dicing” that is sometimes necessary: When a method should only operate on a part of an array, one always have to pass some additional method(array, offset, length) parameters, whereas the slice method of the buffers allows the receiver to handle this generically and transparently.

Things may become ugly in the case that I mentioned above: When one has a complex third-party method like double[] process(double input[]), then copying back and forth between a DoubleBuffer and the array may be tedious.

Bulk operations tend to be pretty fast, though. And the *Buffer implementations internally heavily rely on the Unsafe class, where most methods are implemented as intrinsics (basically meaning that the JVM replaces them with the rawest, lowest-level native machine instructions that are available). So even for frequent get/put calls, the overhead should not be too large.

But particularly for typical GPU programming tasks, the performance implications for various usage patterns of the ByteBuffers and arrays are something that remains to be investigated. I’m sure there are some benchmarks and results available online, but I didn’t do an extensive research here yet.

Regarding the structs library, one could imagine a more “transparent” handling of memory copies for things like SVM. But again, I’d have to allocate a larger chunk of time for that.

I just added you as a collaborator for the JOCLStructs project. Note that although you have the permission to do commits, this is mainly intended to share the current code and maybe discuss implementation options in the issue tracker. Once the repo goes public, I’ll convert it into a normal repo without collaborators and with the normal fork/pull workflow.

rbrodt · 27. April 2019 um 20:53

I completely agree with your approach - it makes a lot more sense when dealing with very large data sets (e.g. database result sets) which need to be streamed in chunks from “point A” to “point B” because of their size.

Thanks for the invitation, and I’ll try to familiarize myself with the code in the coming weeks.