Hi Here is my program that i’m trying to run for getting the matrix row sum. but at the end result in the sum is 0. i have tried the Matrix Row Sum in visual c and the program in c is working fine.
Code in java is
import jcuda.Pointer;
import jcuda.Sizeof;
import jcuda.driver.CUcontext;
import jcuda.driver.CUdevice;
import jcuda.driver.CUdeviceptr;
import jcuda.driver.CUfunction;
import jcuda.driver.CUmodule;
import static jcuda.driver.JCudaDriver.cuCtxCreate;
import static jcuda.driver.JCudaDriver.cuDeviceGet;
import static jcuda.driver.JCudaDriver.cuInit;
import static jcuda.driver.JCudaDriver.cuLaunchKernel;
import static jcuda.driver.JCudaDriver.cuMemAlloc;
import static jcuda.driver.JCudaDriver.cuMemFree;
import static jcuda.driver.JCudaDriver.cuMemcpyDtoH;
import static jcuda.driver.JCudaDriver.cuMemcpyHtoD;
import static jcuda.driver.JCudaDriver.cuModuleGetFunction;
import static jcuda.driver.JCudaDriver.cuModuleLoad;
import jcuda.runtime.JCuda;
/**
*
*
*/
public class MtrixRowSum {
/**
* @param args the command line arguments
*/
public static void main(String[] args) {
int M = 4, N = 4,P=16;
float scores_h[][] = new float[M][N];
float[] a = new float[] {(float)1.35};
int first[][] = new int[M][N];
int sum[] = new int[M*N*4];
int i, j;
//input in host array
for (i = 0; i<M; i++)
{
for (j = 0; j<N; j++)
{
scores_h**[j] = 1;
}
}
//load the function
cuInit(0);
CUcontext pctx = new CUcontext();
CUdevice dev = new CUdevice();
cuDeviceGet(dev, 0);
cuCtxCreate(pctx, 0, dev);
//load the module
CUmodule module = new CUmodule();
cuModuleLoad(module, "matrixRowSum.ptx");
CUfunction function = new CUfunction();
cuModuleGetFunction(function, module, "rowSum");
CUdeviceptr a_dev1 = new CUdeviceptr();
// memory allocation
CUdeviceptr a_dev[] = new CUdeviceptr[P];
for(i=0;i<P;i++){
a_dev**=new CUdeviceptr();
// memory allocation
cuMemAlloc(a_dev**, Sizeof.INT*4*4);
}
for(i=0;i<M;i++){
// copy the content from host to GPU
cuMemcpyHtoD(a_dev**, Pointer.to(scores_h**), Sizeof.FLOAT*4*4);
}
CUdeviceptr b_dev[] = new CUdeviceptr[M];
for(i=0;i<M;i++){
b_dev**=new CUdeviceptr();
// memory allocation
cuMemAlloc(b_dev**, Sizeof.INT*4*4);
}
//Pointer object that will hold all the parameters
Pointer kernelParameters = Pointer.to(
Pointer.to(a_dev),
Pointer.to(b_dev)
);
cuLaunchKernel(function, 1, 1, 1, P, 1, 1, 0, null, kernelParameters, null);
//copy back the result from the GPU to host
for(i=0;i<M;i++){
// copy the content from host to GPU
cuMemcpyDtoH(Pointer.to(sum),b_dev**, Sizeof.FLOAT*4*4);
}
for(i=0;i<M;i++)
{
// print the result
System.out.println("sum: "+sum**);
}
//free the memory...
for(i = 0; i < P; i++)
{
cuMemFree(a_dev**);
}
for(i = 0; i < M; i++)
{
cuMemFree(b_dev**);
}
}
}
the matrixrowsum.ptx is the c program which is the function in visual c 2013 which is working fine and the code is
extern "C"
__global__ void RowSum(float* B, float* Sum, int N, int M)
{
int rowIdx = threadIdx.x + blockIdx.x * blockDim.x;
if (rowIdx < N) {
float sum = 0;
for (int k = 0; k < M; k++)
sum += B[rowIdx*M + k];
Sum[rowIdx] = sum;
}
}
there is no error just the result sum is–
run:
sum: 0
sum: 0
sum: 0
sum: 0
BUILD SUCCESSFUL (total time: 0 seconds)
please guide me what changes i should make in the program…
[edit SlaterB: Blogeintrag von @richa ins Forum übertragen]