Detecting the Supported SVM Type


#1

Based on intel website on shared virtual memory in opencl 2, how do you query svm capability based on the following code below in jocl. As you notice… it has the following type cl_device_svm_capabilites which is not available in jocl.


cl_int err = clGetDeviceInfo(
    deviceID,
    CL_DEVICE_SVM_CAPABILITIES,
    sizeof(cl_device_svm_capabilities),
    &caps,
    0
  );```

The article is here... [https://software.intel.com/en-us/articles/opencl-20-shared-virtual-memory-overview](https://software.intel.com/en-us/articles/opencl-20-shared-virtual-memory-overview)

Hence I can use... the same bit manipulation as shown in the table in the article under same topic as this.

Thank you Marco

#2

The [inline]cl_device_svm_capabilities[/inline] is not an “object”, but only a typedef of a number (particularly, it is just a [inline]cl_bitmask[/inline], which is a [inline]cl_long[/inline]…)

So querying it is possible with

long svmCapabilities[] = { 0 };
clGetDeviceInfo(device, CL_DEVICE_SVM_CAPABILITIES,
    Sizeof.cl_long, Pointer.to(svmCapabilities), null);

With this value, you can do the usual bitwise checks, like

if ((svmCapabilities[0] & CL_DEVICE_SVM_COARSE_GRAIN_BUFFER) != 0) 
{
    // Yep, coarse grain buffer is available...
}

Here is an example that prints the capabilities of all devices on all platforms (for the platforms that support OpenCL >= 2.0) :

import static org.jocl.CL.*;

import org.jocl.*;

public class DetectSvmCapabilities
{
    public static void main(String[] args)
    {
        // Enable exceptions and subsequently omit error checks in this sample
        CL.setExceptionsEnabled(true);

        // Obtain the number of platforms
        int numPlatformsArray[] = new int[1];
        clGetPlatformIDs(0, null, numPlatformsArray);
        int numPlatforms = numPlatformsArray[0];
        
        // Obtain all platform IDs
        cl_platform_id platforms[] = new cl_platform_id[numPlatforms];
        clGetPlatformIDs(platforms.length, platforms, null);
        
        for (cl_platform_id platform : platforms)
        {
            String platformName = getString(platform, CL_PLATFORM_NAME);
            System.out.println("Platform: " + platformName);
            
            float clVersion = getOpenCLVersion(platform);
            System.out.println("  CL version: " + clVersion);            
            if (clVersion < 2.0)
            {
                System.out.println("  (no SVM support)");
                continue;
            }
            
            // Obtain the number of devices for the platform
            int numDevicesArray[] = new int[1];
            clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, 
                0, null, numDevicesArray);
            int numDevices = numDevicesArray[0];
            
            // Obtain the all device IDs 
            cl_device_id allDevices[] = new cl_device_id[numDevices];
            clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, 
                numDevices, allDevices, null);
    
            for (cl_device_id currentDevice : allDevices)
            {
                String deviceName = getString(currentDevice, CL_DEVICE_NAME);
                System.out.println("  Device: " + deviceName);
                
                long svmCapabilities[] = { 0 };
                clGetDeviceInfo(currentDevice, CL_DEVICE_SVM_CAPABILITIES,
                    Sizeof.cl_long, Pointer.to(svmCapabilities), null);
                String svmCapabilitiesString = 
                    CL.stringFor_cl_device_svm_capabilities(
                        svmCapabilities[0]);
                System.out.println("    SVM capabilities: " + 
                    svmCapabilitiesString);
            }
        }
    }
    
    private static float getOpenCLVersion(cl_platform_id platform)
    {
        String deviceVersion = getString(platform, CL_PLATFORM_VERSION);
        String versionString = deviceVersion.substring(7, 10);
        float version = Float.parseFloat(versionString);
        return version;
    }
    
    private static String getString(cl_device_id device, int paramName)
    {
        long size[] = new long[1];
        clGetDeviceInfo(device, paramName, 0, null, size);
        byte buffer[] = new byte[(int)size[0]];
        clGetDeviceInfo(device, paramName, 
            buffer.length, Pointer.to(buffer), null);
        return new String(buffer, 0, buffer.length-1);
    }
    private static String getString(cl_platform_id platform, int paramName)
    {
        long size[] = new long[1];
        clGetPlatformInfo(platform, paramName, 0, null, size);
        byte buffer[] = new byte[(int)size[0]];
        clGetPlatformInfo(platform, paramName, 
            buffer.length, Pointer.to(buffer), null);
        return new String(buffer, 0, buffer.length-1);
    }
    
    
}

Output for me:


Platform: AMD Accelerated Parallel Processing
  CL version: 2.0
  Device: Spectre
    SVM capabilities: CL_DEVICE_SVM_COARSE_GRAIN_BUFFER 
  Device: AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G
    SVM capabilities: CL_DEVICE_SVM_COARSE_GRAIN_BUFFER CL_DEVICE_SVM_FINE_GRAIN_BUFFER CL_DEVICE_SVM_FINE_GRAIN_SYSTEM CL_DEVICE_SVM_ATOMICS 
Platform: NVIDIA CUDA
  CL version: 1.2
  (no SVM support)


#3

Thanks a lot, it has worked well…

Platform: Intel(R) OpenCL
  CL version: 2.0
  Device: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz
    SVM capabilities: CL_DEVICE_SVM_COARSE_GRAIN_BUFFER CL_DEVICE_SVM_FINE_GRAIN_BUFFER CL_DEVICE_SVM_FINE_GRAIN_SYSTEM CL_DEVICE_SVM_ATOMICS 
  Device: Intel(R) HD Graphics 5500
    SVM capabilities: CL_DEVICE_SVM_COARSE_GRAIN_BUFFER CL_DEVICE_SVM_FINE_GRAIN_BUFFER CL_DEVICE_SVM_ATOMICS 

On a side note, it has been a while since I’ve done any OpenCL coding, but the recent advancement from version 2 has made me have an interest in it again. Big projects such Corona Render uses processors from Intel or AMD for high speed parallel computation https://corona-renderer.com/features/proudly-cpu-based/. The trick is through vectorization of ray-object intersection where high computational costs are experienced. Their results are pretty fast. The beauty is that you don’t have to port all your code to gpu, just the intersection code (for corona, they don’t use gpu code but rather direct coding on the Advanced Vector Extensions (AVX) implementations using C or C++). Hence the need to investigate any latency impact on shared virtual memory between host and device in OpenCL will be important.

I’ll set a goal this year to port this code to java & opencl… smallpt: Global Illumination in 99 lines of C++. But once I start incorporating the intersection acceleration later on, I might revisit your other old library for structs implementation :lol:.


#4

Yes, a raytracer is a seemingly “low-hanging fruit” for a GPU implementation - and once, I started to write a few lines, and then the “seemingly” did really strike me hard: It may be simple for the most basic cases, but there are reasons of why Embree or NVIDIA OptiX aren’t just tiny open source projects :wink:

However, this “smallpt” demo looks like something that could be nice to be implemented on the GPU, be it only as a sample/proof of concept…