Mapped memory

Hello,

Does someone try to use mapped memory with JCuda ? I do not understand how allocate a mapped table and how access to it on Java.

I am trying to create a mapped table, then initialize it with Java and finally run a Cuda kernel, without memory transfers.

Can someone help me ?

Thank you.

Hello,

You may be looking for the getByteBuffer method from the Pointer class. Admittedly, I’m not sure if this was already tested thoroughly, but if you encounter difficulties, I can try to set up an example beginning of next week.

bye
Marco

Hello,

I tried this but I got an java.lang.NullPointerException to line 12.
According to the JCuda doc, the getByteBuffer method should not return null with a pointer set by cudaHostAlloc.

Do you see where I am wrong ?

        JCublas.initialize();

        Pointer host = new Pointer();
        Pointer device = new Pointer();

        JCuda.cudaHostAlloc(host,10*Sizeof.FLOAT,JCuda.cudaHostAllocMapped);

        JCuda.cudaHostGetDevicePointer(device, host, 0);

        ByteBuffer bb = host.getByteBuffer(0,10*Sizeof.FLOAT);

        bb.putFloat(0, 50f);

        JCublas.cublasSscal(1, 2.0f, device, 1);

        float result = bb.getFloat(0);

        System.out.println(result);

        JCuda.cudaFreeHost(host);```

Hello,

Three issues:

  1. You need a device that supports mapped memory. (This should be no problem, it was supported since Compute Capability 1.1 - even my “old” GeForce 8800 supports it)

  2. You have to specify that you’re going to use mapped memory, by calling


cudaSetDeviceFlags(cudaDeviceMapHost)

before calling any other CUDA function.

  1. You have to use the right byte order for the ByteBuffer, by calling

byteBuffer.order(ByteOrder.nativeOrder());

Here is an example showing how to use mapped memory with JCuda:

import java.nio.*;

import jcuda.*;
import static jcuda.runtime.JCuda.*;
import static jcuda.jcublas.JCublas.*;
import jcuda.runtime.*;

public class JCudaMappedMemoryTest
{
    public static void main(String args[])
    {
        // Enable exceptions to quickly be informed about errors in this test
        JCuda.setExceptionsEnabled(true);

        // Check if the device supports mapped host memory
        cudaDeviceProp deviceProperties = new cudaDeviceProp();
        cudaGetDeviceProperties(deviceProperties, 0);
        if (deviceProperties.canMapHostMemory == 0)
        {
            System.err.println("This device can not map host memory");
            System.err.println(deviceProperties.toFormattedString());
            return;
        }

        // Set the flag indicating that mapped memory will be used
        cudaSetDeviceFlags(cudaDeviceMapHost);

        // Allocate mappable host memory
        int n = 5;
        Pointer host = new Pointer();
        cudaHostAlloc(host, n * Sizeof.FLOAT, cudaHostAllocMapped);

        // Create a device pointer mapping the host memory
        Pointer device = new Pointer();
        cudaHostGetDevicePointer(device, host, 0);

        // Obtain a ByteBuffer for accessing the data in the host
        // pointer. Modifications in this ByteBuffer will be
        // visible in the device memory.
        ByteBuffer byteBuffer = host.getByteBuffer(0, n * Sizeof.FLOAT);

        // Set the byte order of the ByteBuffer
        byteBuffer.order(ByteOrder.nativeOrder());

        // For convenience, view the ByteBuffer as a FloatBuffer
        // and fill it with some sample data
        FloatBuffer floatBuffer = byteBuffer.asFloatBuffer();
        System.out.print("Input : ");
        for (int i = 0; i < n; i++)
        {
            floatBuffer.put(i, (float)i);
            System.out.print(floatBuffer.get(i) + ", ");
        }
        System.out.println();

        // Apply a CUBLAS routine to the device pointer. This will
        // modify the host data, which was mapped to the device.
        cublasInit();
        cublasSscal(n, 2.0f, device, 1);
        cudaDeviceSynchronize();

        // Print the contents of the host memory after the
        // modification via the mapped pointer.
        System.out.print("Output: ");
        for (int i = 0; i < n; i++)
        {
            System.out.print(floatBuffer.get(i) + ", ");
        }
        System.out.println();

        // Clean up
        cudaFreeHost(host);
    }
}

EDIT: Inserted the synchronization as suggested by korzen303. Although I have not verified (or experienced) that it IS necessary, there are probably cases where it is required. A lame excuse/explaination for the fact that it was missing could be that this example was created for the “old” CUBLAS API (i.e. not for CUBLAS v2), where streams have not been an integral part of the API - but maybe there are cases where it was necessary even with the old API :o

That works !!

Many thanks,
Bye.

add cudaDeviceSynchronize(); at line 60 to ensure proper output