Hi,
I run a regular Java code and uses jCUDA for the part of the calculus that computes a dot Product on 2 vectors of size 10,000. Precisely:
looping through a matrix of size (10000,10000):
for (int i = 0;i<10000,i++){
for (int j = 0;j<10000,j++){
[INDENT]
myjCUDAdotProductObject DP = new myjCUDAdotProductObject(vector col(i),vector col(j))[/INDENT]
}}
=> for the moment, the java app crashes after 1% of the loops completed (see previous post on this forum)
=> but here my question is: is it the best way to send data to the GPU? I suspect it would be better to send it in batch, do calculations, and send it back to the main program? The way I do it now might even be a reason for the crash?
Any help or advice would be appreciated!
Thanks,
Clement
PS: the class myjCUDAdotProductObject if that can make things more clear:
import cern.colt.matrix.DoubleMatrix1D;
import cern.colt.matrix.impl.SparseDoubleMatrix1D;
import jcuda.Pointer;
import jcuda.Sizeof;
import jcuda.jcublas.JCublas;
public class JCublaDotProduct
{
static double dotProductCUDA;
static double dotProductJava;
static int n;
public static double dotProduct(double A[],double B[])
{
double h_A[] = A;
double h_B[] = B;
n = A.length;
// Clock javaClock = new Clock("Performing SDot with Java");
SDotJava(h_A, h_B);
// javaClock.closeAndPrintClock();
// Clock JCublasClock = new Clock("Performing SDot with JCublas");
SdotCuda(h_A, h_B);
// JCublasClock.closeAndPrintClock();
// System.out.println("CUDA: "+dotProductCUDA);
// System.out.println("JAVA: "+dotProductJava);
return dotProductCUDA;
// System.out.println("dot Product with Java is "+ dotProductJava);
// System.out.println("dot Product with CUDA is "+ dotProductCUDA);
}
private static void SdotCuda(double A[], double B[])
{
// Initialize JCublas
JCublas.cublasInit();
JCublas.setExceptionsEnabled(true);
// Allocate memory on the device
Pointer d_A = new Pointer();
Pointer d_B = new Pointer();
JCublas.cublasAlloc(n, Sizeof.DOUBLE, d_A);
JCublas.cublasAlloc(n, Sizeof.DOUBLE, d_B);
// Copy the memory from the host to the device
JCublas.cublasSetVector(n, Sizeof.DOUBLE, Pointer.to(A), 1, d_A, 1);
JCublas.cublasSetVector(n, Sizeof.DOUBLE, Pointer.to(B), 1, d_B, 1);
// Execute SDot
dotProductCUDA = JCublas.cublasDdot(
n, d_A,1, d_B,1);
// Clean up
JCublas.cublasFree(d_A);
JCublas.cublasFree(d_B);
JCublas.cublasShutdown();
}
// this function is useful when the vectors have a length of 1000 or less
private static void SDotJava(double A[], double B[]){
// double[] A2 = new double[A.length];
// double[] B2 = new double[B.length];
// for (int i = 0; i < A.length; i++)
// {
// A2** = A**;
// B2** = B**;
// }
DoubleMatrix1D sourceDoc = new SparseDoubleMatrix1D(A);
//sourceDoc.assign(A2);
DoubleMatrix1D targetDoc = new SparseDoubleMatrix1D(B);
//targetDoc.assign(B2);
dotProductJava = sourceDoc.zDotProduct(targetDoc);
}
}