Background: the matrix operations of jCublas and cuLaunchKernel live in different libraries with slightly different APIs and the stream handle type is different for the two. The two different types (CUstream and cudaStream_t) are equivalent in C, but in java I cannot find a way to convert from one to the other.
So I can do this to synchronize cublas calls:
cudaStreamCreate(x);
cublasSetStream(s);```
and I can create this stream to synchronize my own kernels:
```CUstream s = new CUstream();
cuStreamCreate(s, 0);```
But I can't synchronize across the two different types of computation.
So my question is, how (if possible) can I synchronize by making cublas calls and launch own kernels in the same stream (other than the default stream of course).
Any help would be much appreciated!
You’re right: A conversion between these types is currently not supported.
Although I have not yet used non-default-streams extensively, I should probably have noticed this when someone pointed out that a similar conversion was necessary between CUarray and cudaArray. Appropriate constructors for these conversions have been added in version 0.5.0, and the solution will here probably be the same, namely allow the construction of CUstream from cudaStream and vice versa. I’ll have to run some more tests using streams anyhow, as soon as I find the time to integrate the stream callbacks…
So at the moment, there is no “official” solution for this. But depending on how urgent/important this issue is for you, it would of course be possible to write a preliminary method like
Well, all the objects that are represented as a typedef’ed pointer in C are represented by classes extending the “NativePointerObject” class in JCuda. This base class does not much more than store the native pointer in a long variable, and accordingly, this hack does the same as a simple assignment like
someCUstream.nativePointer = someCudaStream.nativePointer;
but uses reflection to circumvent the private visibility of the ‘nativePointer’ variable:
// XXX Remove this method as soon as the conversion cudaStream_t<->CUstream is supported!
private static CUstream convert(cudaStream_t s)
{
try
{
CUstream stream = new CUstream();
java.lang.reflect.Field field =
jcuda.NativePointerObject.class.getDeclaredField(
"nativePointer");
field.setAccessible(true);
long value = field.getLong(s);
field.setLong(stream, value);
field.setAccessible(false);;
return stream;
}
catch (NoSuchFieldException e)
{
throw new RuntimeException(e);
}
catch (SecurityException e)
{
throw new RuntimeException(e);
}
catch (IllegalArgumentException e)
{
throw new RuntimeException(e);
}
catch (IllegalAccessException e)
{
throw new RuntimeException(e);
}
}
Note that this is really an ugly hack, and the ‘XXX’ should be taken serious.
It might have been easier to simply offer a
public long getNativePointer()
method and add constructors accepting a ‘long’ value to all theses classes right from the beginning. And in fact, JCuda is such a thin layer around CUDA that the current restriction does not really bring much additional safety. But maybe some type safety and clarity, once the appropriate conversions are all supported: Section 5.26 of the reference manual lists some other legal conversions (CUevent<->cudaEvent etc…) that will also be introduced in the next version. Maybe I’ll publish a dedicated update for this - it should not be such a great effort, because it only affects the Java side and no recompilation of the native libs will be required.
OK, fine. I’ll try to do the update soon, but will probably not be able to do it this week - maybe next week, but there are still some other tasks in the queue.
I have just uploaded version 0.5.0a, where the conversions between streams, events and graphics resources of the runtime- and driver API are supported via constructors of the respective classes. So you may remove the hack as described above.