are there any plans to support the new functions of Cuda 4.0?
I’m referring to use a GPU variable to store the results of the above mentioned functions and to use a single value GPU variable in other cublas functions as input(like in the factor in cublas(S/D)axpy).
I assume that you’re not referring to the missing functions in JCublas that have been mentioned recently, but to the new CUBLAS, aka “CUBLAS v2”.
I’m already working on JCublas2. It’s a little bit complicated for several reasons:
I’m not sure whether I’ll create it as a “completely new” library, or whether I’ll use JCublas2 as a “backend” for the old JCublas (and just passing all method calls to the new ones, like it is done in the native CUBLAS/CUBLAS2 implementation). Probably, during the transition phase, it will be two different libraries, but maybe they can be merged when the new version is tested and stable.
In the “old” CUBLAS, one could simply assume that all pointers of the BLAS functions are device pointers. Now some of them can be host OR device pointers (only indicated by a comment in the CUBLAS header file…)
Last but not least, and related to the previous point: The old CUBLAS functions had been so “homogeneous” that I had written a small tool for auto-generating most of the source code for the BLAS functions. In the meantime, I started a more powerful tool for this purpose, and I wanted to use it for JCublas2, so this development is running in parallel.
I’ll try to get a few free days next month so that I can put some more effort in this and hopefully can make some progress and upload an early version soon.