Cuda_error_out_of_memory

In a program using jcuda I get the following error: CUDA_ERROR_OUT_OF_MEMORY.

¿what’s the reason? I don’t understand that because I always free memory.

I post my code for better comprehension, even you can correct me if I’m doing something wrong.

File Array.java

import jcuda.jcurand.JCurand;
import static jcuda.jcurand.JCurand.curandCreateGeneratorHost;
import static jcuda.jcurand.JCurand.curandSetPseudoRandomGeneratorSeed;
import static jcuda.jcurand.JCurand.curandGenerateNormalDouble;
import static jcuda.jcurand.JCurand.curandDestroyGenerator;
import static jcuda.jcurand.curandRngType.CURAND_RNG_PSEUDO_DEFAULT;
import jcuda.jcurand.curandGenerator;

////////////
import jcuda.Pointer;
import jcuda.Sizeof;
import jcuda.driver.CUcontext;
import jcuda.driver.CUdevice;
import jcuda.driver.CUdeviceptr;
import jcuda.driver.CUfunction;
import jcuda.driver.CUmodule;
import jcuda.driver.JCudaDriver;
import static jcuda.driver.JCudaDriver.cuDeviceGet;
import static jcuda.driver.JCudaDriver.cuCtxCreate;
import static jcuda.driver.JCudaDriver.cuModuleLoad;
import static jcuda.driver.JCudaDriver.cuModuleGetFunction;
import static jcuda.driver.JCudaDriver.cuMemAlloc;
import static jcuda.driver.JCudaDriver.cuMemcpyHtoD;
import static jcuda.driver.JCudaDriver.cuLaunchKernel;
import static jcuda.driver.JCudaDriver.cuCtxSynchronize;
import static jcuda.driver.JCudaDriver.cuMemFree;
import static jcuda.driver.JCudaDriver.cuMemcpyDtoH;

public class Array {
    public double array[];
    private int tam;

    public Array(int tam, int sem){
        
        JCurand.setExceptionsEnabled(true);

        ////////////////

        this.tam = tam;
        this.array = new double[tam];

        curandGenerator g = new curandGenerator();
        curandCreateGeneratorHost(g, CURAND_RNG_PSEUDO_DEFAULT);
        curandSetPseudoRandomGeneratorSeed(g, sem);
        curandGenerateNormalDouble(g, Pointer.to(this.array), tam, 0, 1);
        curandDestroyGenerator(g);
        
    }

    public final void print(){
        System.out.printf("
");
        for(int i = 0; i < tam; i++){
            System.out.printf("%f
", array**);
        }
    }

    public int getLength(){
        return tam;
    }

    public double getMean(){

        JCudaDriver.setExceptionsEnabled(true);
        JCudaDriver.cuInit(0);

        /////////////

        CUdevice d = new CUdevice();
        cuDeviceGet(d, 0);
        CUcontext c = new CUcontext();
        cuCtxCreate(c, 0, d);

        ///////////

        CUmodule m = new CUmodule();
        cuModuleLoad(m, "cudaVectorSumatoryKernel.cubin");
        CUfunction f = new CUfunction();
        cuModuleGetFunction(f, m, "sumatory");

        ///////////

        int n = getLength();
        int itemsPerThread = 10;
        int threads = n / itemsPerThread;

        ///////////

        CUdeviceptr devArray = new CUdeviceptr();
        cuMemAlloc(devArray, n * Sizeof.DOUBLE);
        cuMemcpyHtoD(devArray, Pointer.to(array),
                n * Sizeof.DOUBLE);

        CUdeviceptr devSum = new CUdeviceptr();
        cuMemAlloc(devSum, threads * Sizeof.DOUBLE);

        ///////////

        Pointer argsk = Pointer.to(
                Pointer.to(new int[]{n}),
                Pointer.to(new int[]{itemsPerThread}),
                Pointer.to(devArray),
                Pointer.to(devSum)
        );

        cuLaunchKernel(f, 1, 1, 1, threads, 1, 1, 0, null, argsk, null);
        cuCtxSynchronize();

        ///////////

        cuMemFree(devArray);

        double sum[] = new double[threads];
        cuMemcpyDtoH(Pointer.to(sum), devSum, threads * Sizeof.DOUBLE);

        cuMemFree(devSum);

        ///////////

        double sumatory = 0;
        for (int i = 0; i < threads; i++){
            sumatory += sum**;
        }

        return sumatory / n;
    }
}

File Test.java


    public static void main(String args[]){

        int n = 10;

        /////I want to create n arrays and get the average of its elements.
        ////But when variable i its equal to 26, I get the error
        for(int i = 0; i < 100; i++){
                Array a = new Array(n, i);
                double meanA = a.getMean();
                System.out.printf("
Average %d: %f
", i, meanA);
        }
    }
    
}```

Hello

You should call a
cuCtxDestroy©;
at the end. Otherwise, you are creating hundreds of CUDA contexts.

But as a general hint: This will not bring any speedup or so. Maybe it was just a test, but you should not create a context and load a module each time you want to call this function (and probably also not create a new CURAND handle each time such an object is constructed). This setup/shutdown procedures may take a lot of time. Especially loading a new module from a file - reading a single byte from the hard disk will take longer than doing the whole computation hundreds of times in plain Java :wink:

Any recommendation will heavily depend on the intended application of this class, but in the simplest case(!) you might consider wrapping the initialization/shutdown away, to allow a usage like

// Create the CURAND handle and the context, module and function handle
// (Possibly (!) as static instances in this class)
Array.setup();

for (int i=0; i<n; i++)
{
    Array a = new Array(...);

    // Only do the minimal memory copies and the kernel launch in this function
    double d = a.getMean();
    ...
}

// Destroy the CURAND handle and context etc.
Array.shutdown();

More sophisticated solutions might be possible, for example, to automatically check whether there already is an active CUDA context, and either attach to it or create a new one lazily, but again: This depends on how the class should be used.

bye
Marco

I’m very greatful. Thanks a lot.

I’ve considered what you told me and the results are satisfactory.

// Create the CURAND handle and the context, module and function handle
// (Possibly (!) as static instances in this class)

Done.

You should call a
cuCtxDestroy(c);
at the end.

Done.

Thanks!