first off, great library! However I am currently facing issues when multiplying matrices that are quite big (>144x144).
Some infos about my system:
- Windows 7 64 bit
- Java 7 64 Bit (1.7.0_03)
- CUDA v4.1 (64bit) with nvidia driver 295.73 on a GTX580
- JCUDA *0.4.1 (64bit)
Let me just explain it a bit to you.
I am currently developing a collaborative filtering algorithm based on ml-class.org implementation. This involves some heavy multiplication of large matrices, so I guess CUDA/CUBLAS would be a bit of improvement.
On CPU everything works as expected, but with CUDA on larger matrices I get problems.
I have written a “test-case” in the main method were I am passing my own class (DenseDoubleMatrix, row major format aka a plain double) filled with random doubles. To make it a bit more easier, I passed quadratic matrices.
The whole CUDA stuff works like this: (how I think it should)
[li]transform my matrix into column major format in a single double array
[/li][li]I am going to allocate device memory for input matrix A.
[/li][li]use JCublas2.cublasSetMatrix(…) to “write” it to the device memory.
[/li][li]do the same stuff with the other matrix
[/li][li]allocate device memory for the output matrix
[/li][li]call cublasDgemm with the parameters it needs and synchronize the device
[/li][li]retrieve the result matrix with cublasGetMatrix and unfold it back to my own matrix class
This works fine. So I don’t think there is a major problem, however when I increase the size of the matrix I get sporadic NaN values in the output matrix.
The sad thing is, that no error is thrown.
I execute a multiplication on CPU with the standard 3-loop method and then execute with JCublas.
It is normal that there are some rounding errors within the solution of the CPU and the GPU, so I just took the a difference of both matrices and sum the absolute element values.
Small differences are not a big deal for me, however NaN’s are serious problems.
I have prepared you a sample output: (// are comments)
// caught the device an error? no error // using 2x2 results in difference of 0.0 2 0.0 no error 3 3.3306690738754696E-16 no error 4 2.7755575615628914E-16 no error // no problems so long.. 143 2.63792543364616E-10 no error 144 2.6637181349542516E-10 no error // BAM not working anymore 145 NaN no error 146 NaN no error 147 NaN no error 148 NaN no error 149 NaN no error 150 NaN no error 151 NaN no error 152 NaN no error 153 NaN no error 154 NaN no error 155 NaN // rest NaN errors omitted...
The worst thing is that it is not always breaking at 144x144, sometimes just at 244x244 or even at 312x312.
And as you can see, there are no errors recorded.
Debugging through the matrices reveals that there are only small and arbitrary amounts of elements NaN.
Do I have a hardware failure or am I missing something obvious?
Thanks, if you need additional information I be glad to hand them to you.