Hi,
first off, great library! However I am currently facing issues when multiplying matrices that are quite big (>144x144).
Some infos about my system:
- Windows 7 64 bit
- Java 7 64 Bit (1.7.0_03)
- CUDA v4.1 (64bit) with nvidia driver 295.73 on a GTX580
- JCUDA *0.4.1 (64bit)
Here is my code: https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/math/cuda/JCUDAMatrixUtils.java
Let me just explain it a bit to you.
I am currently developing a collaborative filtering algorithm based on ml-class.org implementation. This involves some heavy multiplication of large matrices, so I guess CUDA/CUBLAS would be a bit of improvement.
On CPU everything works as expected, but with CUDA on larger matrices I get problems.
I have written a “test-case” in the main method were I am passing my own class (DenseDoubleMatrix, row major format aka a plain double[][]) filled with random doubles. To make it a bit more easier, I passed quadratic matrices.
The whole CUDA stuff works like this: (how I think it should)
[ul]
[li]transform my matrix into column major format in a single double array
[/li][li]I am going to allocate device memory for input matrix A.
[/li][li]use JCublas2.cublasSetMatrix(…) to “write” it to the device memory.
[/li][li]do the same stuff with the other matrix
[/li][li]allocate device memory for the output matrix
[/li][li]call cublasDgemm with the parameters it needs and synchronize the device
[/li][li]retrieve the result matrix with cublasGetMatrix and unfold it back to my own matrix class
[/li][/ul]
This works fine. So I don’t think there is a major problem, however when I increase the size of the matrix I get sporadic NaN values in the output matrix.
The sad thing is, that no error is thrown.
I execute a multiplication on CPU with the standard 3-loop method and then execute with JCublas.
It is normal that there are some rounding errors within the solution of the CPU and the GPU, so I just took the a difference of both matrices and sum the absolute element values.
Small differences are not a big deal for me, however NaN’s are serious problems.
I have prepared you a sample output: (// are comments)
// caught the device an error?
no error
// using 2x2 results in difference of 0.0
2 0.0
no error
3 3.3306690738754696E-16
no error
4 2.7755575615628914E-16
no error
// no problems so long..
143 2.63792543364616E-10
no error
144 2.6637181349542516E-10
no error
// BAM not working anymore
145 NaN
no error
146 NaN
no error
147 NaN
no error
148 NaN
no error
149 NaN
no error
150 NaN
no error
151 NaN
no error
152 NaN
no error
153 NaN
no error
154 NaN
no error
155 NaN
// rest NaN errors omitted...
The worst thing is that it is not always breaking at 144x144, sometimes just at 244x244 or even at 312x312.
And as you can see, there are no errors recorded.
Debugging through the matrices reveals that there are only small and arbitrary amounts of elements NaN.
Do I have a hardware failure or am I missing something obvious?
Thanks, if you need additional information I be glad to hand them to you.