Synchronizing own kernels with jCublas calls

Background: the matrix operations of jCublas and cuLaunchKernel live in different libraries with slightly different APIs and the stream handle type is different for the two. The two different types (CUstream and cudaStream_t) are equivalent in C, but in java I cannot find a way to convert from one to the other.

So I can do this to synchronize cublas calls:

cudaStreamCreate(x);
cublasSetStream(s);```

and I can create this stream to synchronize my own kernels:
```CUstream s = new CUstream();
cuStreamCreate(s, 0);```

But I can't synchronize across the two different types of computation.

So my question is, how (if possible) can I synchronize by making cublas calls and launch own kernels in the same stream (other than the default stream of course).

Any help would be much appreciated!

Hello Nikko,

You’re right: A conversion between these types is currently not supported.

Although I have not yet used non-default-streams extensively, I should probably have noticed this when someone pointed out that a similar conversion was necessary between CUarray and cudaArray. Appropriate constructors for these conversions have been added in version 0.5.0, and the solution will here probably be the same, namely allow the construction of CUstream from cudaStream and vice versa. I’ll have to run some more tests using streams anyhow, as soon as I find the time to integrate the stream callbacks…

So at the moment, there is no “official” solution for this. But depending on how urgent/important this issue is for you, it would of course be possible to write a preliminary method like

private static CUstream convert(cudaStream s)
{
    CUstream stream = new CUstream();
    // Nasty reflection hacks....
    ...
    return stream;
}

that can later be replaced by simply replacing its call with the constructor that is most likely to be introduced:

//CUstream stream = convert(someCudaStream);
CUstream stream = new CUstream(someCudaStream);

Might this be a feasible workaround for now?

bye
Marco

Thanks for the reply Marco!

I guess if the “reflection hacks” work that would be a viable workaround.
However, I’m not clever enough to write that hack myself.

Btw, thanks for maintaining jcuda. I know it is hard to find the time and energy to keep it up.

Nikko

Well, all the objects that are represented as a typedef’ed pointer in C are represented by classes extending the “NativePointerObject” class in JCuda. This base class does not much more than store the native pointer in a long variable, and accordingly, this hack does the same as a simple assignment like
someCUstream.nativePointer = someCudaStream.nativePointer;
but uses reflection to circumvent the private visibility of the ‘nativePointer’ variable:

    // XXX Remove this method as soon as the conversion cudaStream_t<->CUstream is supported!
    private static CUstream convert(cudaStream_t s)
    {
        try 
        {
            CUstream stream = new CUstream();
                        java.lang.reflect.Field field = 
                jcuda.NativePointerObject.class.getDeclaredField(
                    "nativePointer");
            field.setAccessible(true);
            long value = field.getLong(s);
            field.setLong(stream, value);
            field.setAccessible(false);;
            return stream;
        } 
        catch (NoSuchFieldException e) 
        {
            throw new RuntimeException(e);
        } 
        catch (SecurityException e) 
        {
            throw new RuntimeException(e);
        } 
        catch (IllegalArgumentException e) 
        {
            throw new RuntimeException(e);
        } 
        catch (IllegalAccessException e) 
        {
            throw new RuntimeException(e);
        }
    }

Note that this is really an ugly hack, and the ‘XXX’ should be taken serious.

It might have been easier to simply offer a
public long getNativePointer()
method and add constructors accepting a ‘long’ value to all theses classes right from the beginning. And in fact, JCuda is such a thin layer around CUDA that the current restriction does not really bring much additional safety. But maybe some type safety and clarity, once the appropriate conversions are all supported: Section 5.26 of the reference manual lists some other legal conversions (CUevent<->cudaEvent etc…) that will also be introduced in the next version. Maybe I’ll publish a dedicated update for this - it should not be such a great effort, because it only affects the Java side and no recompilation of the native libs will be required.

Thanks for pointing this out!

Marco

Thank you. That appears to be working.

OK, fine. I’ll try to do the update soon, but will probably not be able to do it this week - maybe next week, but there are still some other tasks in the queue.

I have just uploaded version 0.5.0a, where the conversions between streams, events and graphics resources of the runtime- and driver API are supported via constructors of the respective classes. So you may remove the hack as described above.

The start of a (possible) discussion about obtaining the native pointer value from a Pointer object has been moved to a new thread: http://forum.byte-welt.net/threads/11250-Obtaining-native-pointer-from-Pointer-object