I have a rather easy to answer question, does JCuda support Stream at the moment?
I read it does not support Stream-callbacks, but I am not sure if the two are related or in which relation they are.
If JCuda supports Streams could you point me in a direction of how to use these Streams?
I have worked with Cuda and JCuda extensively by now but Streams have just been
brought to my attention and in my mind at least they should provide significant speedup
in a user case where I am doing multiple small matrix multiplications, which can be parallelized.
Note that the stream callbacks are not yet contained in the current release. They will be part of the next release. You could compile the binaries on your own to use them, or wait for the next release. (For Windows, I could provide “SNAPSHOT” binaries that already support them, but some manual setup steps will be necessary then)