cublas(S/D) nrm2 and cublas(S/D)dot


are there any plans to support the new functions of Cuda 4.0?
I’m referring to use a GPU variable to store the results of the above mentioned functions and to use a single value GPU variable in other cublas functions as input(like in the factor in cublas(S/D)axpy).

Greetings Felix


I assume that you’re not referring to the missing functions in JCublas that have been mentioned recently, but to the new CUBLAS, aka “CUBLAS v2”.

I’m already working on JCublas2. It’s a little bit complicated for several reasons:

  • I’m not sure whether I’ll create it as a “completely new” library, or whether I’ll use JCublas2 as a “backend” for the old JCublas (and just passing all method calls to the new ones, like it is done in the native CUBLAS/CUBLAS2 implementation). Probably, during the transition phase, it will be two different libraries, but maybe they can be merged when the new version is tested and stable.
  • In the “old” CUBLAS, one could simply assume that all pointers of the BLAS functions are device pointers. Now some of them can be host OR device pointers (only indicated by a comment in the CUBLAS header file…)
  • Last but not least, and related to the previous point: The old CUBLAS functions had been so “homogeneous” that I had written a small tool for auto-generating most of the source code for the BLAS functions. In the meantime, I started a more powerful tool for this purpose, and I wanted to use it for JCublas2, so this development is running in parallel.

I’ll try to get a few free days next month so that I can put some more effort in this and hopefully can make some progress and upload an early version soon.