Allocation of large pinned memory chunk


#1

I’m implementing GPU calculation in a program already written in Java.

I need a fast host to device memory transfer of, sometimes, relatively large arrays. If I want to use streams, I have to use pinned memory. The problem is if I want to allocate host pinned memory larger than cca 600 Mbs of RAM, I get “CUDA_ERROR_OUT_OF_MEMORY” exception.
This is the code I used to test size of the available pinned memory:

		//Init GPU
		JCudaDriver.setExceptionsEnabled(true);
		
		// Initialize the device and create device context
		cuInit(0);
		CUdevice device = new CUdevice();
		cuDeviceGet(device, 0);
		CUcontext context = new CUcontext();
		cuCtxCreate(context, 0, device);
		
		Pointer p = new Pointer();

		int Kb = 1024;
		int Mb = 1024 * Kb;
		int Gb = 1024 * Mb;
		int sequenceSize = 172*Mb; // times 4 for float
		float[] expecteds = new float[sequenceSize];
		float[] actuals = new float[sequenceSize];
		Arrays.fill(expecteds, 3.33f);
		int i = 0;
		try {
			JCudaDriver.cuMemAllocHost(p, sequenceSize* Sizeof.FLOAT);
			FloatBuffer fb = p.getByteBuffer(0, sequenceSize* Sizeof.FLOAT).
					order(ByteOrder.nativeOrder()).
					asFloatBuffer();
			
			fb.position(0);
			fb.put(expecteds);
			fb.position(0);
			fb.get(actuals);
			JCudaDriver.cuMemFreeHost(p);
			
		} catch (Exception e) {
			e.printStackTrace();
			JCudaDriver.cuMemFreeHost(p);
		}

	}```

Now, I'm aware that OS can prevent me to use too much pinned memory since it's non-pageable. The thing is that I have 48Gb (45Gb free) of physical memory and I need a way of forcing OS to give me more of it. Is there a way to do this (elegantly if possible)?

OS is 64-bit Windows 7 Professional SP1

#2

Hello

There recently was another thread about an operation involving large memory allocations ( http://forum.byte-welt.de/showthread.php?p=17931#post17931 ) - I still think that there was a limit for the maximum size of one allocation, but could not find anything in the documentation.

But the situation may be different here anyhow, I’ll have a closer look at this beginning of next week

( I assume that the other thread ( http://forum.byte-welt.de/showthread.php?p=18239#post18239 ) only was an attempt to circumvent this problem…?)

bye
Marco


#3

[QUOTE=Marco13;18240]
( I assume that the other thread ( http://forum.byte-welt.de/showthread.php?p=18239#post18239 ) only was an attempt to circumvent this problem…?)
[/QUOTE]
You’re absolutely right :slight_smile:


#4

I just ran a test on a 24GB (Win7 64) machine, using a slightly modified program (see below). It was able to allocate 1GB of native (host) memory before bailing out.

When I have the chance, I’ll try to run another test with a C program, and see whether it’s possible to allocate larger blocks there.

import java.nio.ByteOrder;
import java.nio.FloatBuffer;
import java.util.Arrays;

import jcuda.Pointer;
import jcuda.Sizeof;
import jcuda.driver.CUcontext;
import jcuda.driver.CUdevice;
import jcuda.driver.JCudaDriver;
import static jcuda.driver.JCudaDriver.*;

public class LargeMemoryAllocTest 
{
    public static void main(String[] args) 
    {
        JCudaDriver.setExceptionsEnabled(true);
        cuInit(0);
        CUdevice device = new CUdevice();
        cuDeviceGet(device, 0);
        CUcontext context = new CUcontext();
        cuCtxCreate(context, 0, device);
       
        Pointer p = new Pointer();
        int M = 1024 * 1024;
        for (int numElements=50*M; numElements<=500*M; numElements+=50*M)
        {
	        try 
	        {
	        	System.out.println("Allocating "+numElements+" elements");
	        	System.out.println("    Native: "+((numElements * 4L)/M)+" MB");
	        	System.out.println("    Java  : "+((numElements * 4L * 2)/M)+" MB");
	            float[] expecteds = new float[numElements];
	            float[] actuals = new float[numElements];
	            Arrays.fill(expecteds, 3.33f);
	            
	            JCudaDriver.cuMemAllocHost(p, numElements* Sizeof.FLOAT);
	            FloatBuffer fb = p.getByteBuffer(0, numElements* Sizeof.FLOAT).
	                    order(ByteOrder.nativeOrder()).
	                    asFloatBuffer();
	           
	            fb.position(0);
	            fb.put(expecteds, 0, numElements);
	            fb.position(0);
	            fb.get(actuals, 0, numElements);
	            boolean equal = Arrays.equals(expecteds, actuals);
	            System.out.println("Equal? "+equal);
	            
	            JCudaDriver.cuMemFreeHost(p);
	        } 
	        catch (Exception e) 
	        {
	            e.printStackTrace();
	            return;
	        }
        }
    }
}