Hello
You got me on this one. This indeed is a bug. There is no ByteBuffer allocated for such a pointer.
As a (lame?) excuse, I can only say that I got my first card with Unified Memory Support only a short while ago, and since then, did not have the chance to test all the new features („new“ here means: Features that had not been supported by my GeForce 8800 :rolleyes: and that’s quite a lot…).
Although I had read the unified memory blog entry back when CUDA 6 was published, I probably did not (and still do not) understand all details in all depth. For example, as far as I understood, you’d need cudaMemAttachHost
, because cudaMemAttachGlobal
makes the memory only available to all devices (and not to the host). Additionally, if the memory was accessible to the host, it has to be made accessible to the device explicitly by calling cudaStreamAttachMemAsync
.
I just added a fix, and ran a test program
import static jcuda.jcublas.JCublas2.*;
import static jcuda.runtime.JCuda.*;
import java.nio.*;
import jcuda.*;
import jcuda.jcublas.*;
import jcuda.runtime.JCuda;
public class JCudaUnifiedMemory
{
public static void main(String[] args)
{
JCuda.setExceptionsEnabled(true);
JCublas.setExceptionsEnabled(true);
// Allocate managed memory that is accessible to the host
Pointer p = new Pointer();
int n = 10;
long size = n * Sizeof.FLOAT;
cudaMallocManaged(p, size, cudaMemAttachHost);
// Obtain the byte buffer from the pointer
ByteBuffer bb = p.getByteBuffer(0, size);
System.out.println("Buffer on host side: "+bb);
// Fill the buffer with sample data
FloatBuffer fb = bb.order(ByteOrder.nativeOrder()).asFloatBuffer();
for (int i=0; i<n; i++)
{
fb.put(i, i);
}
// Make the buffer accessible to all devices
cudaStreamAttachMemAsync(null, p, 0, cudaMemAttachGlobal);
cudaStreamSynchronize(null);
// Use the buffer in a device operation
// (here, a dot product with JCublas, for example)
cublasHandle handle = new cublasHandle();
cublasCreate(handle);
float result[] = { -1.0f };
cublasSdot(handle, n, p, 1, p, 1, Pointer.to(result));
System.out.println(result[0]);
}
}
and this seems to work, basically.
However, I’m not sure how to handle this. I could simply and quickly add this fix and create a new release. But I’m a bit hesitant here: I did not yet test it thoroughly, and I’d prefer to test some application scenarios of Unified Memory before releasing it, if possible. Additionally, from what I have read so far, this Unified Memory feature opens the door for nasty bugs. For example, I’m not sure what should happen to the ByteBuffer
after cudaStreamAttachMemAsync
was called. But this is only one aspect that caused me to raise an eyebrow. E.g. the API documentation says things like
Accessing memory on the device from streams that are not associated with it will produce undefined results.
No error checking is performed by the Unified Memory system to ensure that kernels launched into other streams do not access this region.
It is a program’s responsibility to order calls to ::cudaStreamAttachMemAsync via events, synchronization or other means to ensure legal access to memory at all times.
If \p stream is destroyed while data is associated with it, the association is removed and the association reverts to the default visibility of the allocation as specified at ::cudaMallocManaged
and I think that this can have subtle implications (and possibly not-so-subtle consequences - maybe even nasty crashes when accessing the ByteBuffer after attaching the memory to a stream or something like that…).
I’ll try to come to a conclusion here has quickly as reasonably possible. However, I can not give any specific time for when the solution will be available in a release. If you’d like to do some basic tests (and tell me what does not work (which would be nice) and how to fix it (which would be great :D)), then I could either provide you a „patched“ Windows64 binary which I used for the above test, or (if you’re on Linux/Mac) provide you the updated source. (It’s only a few added lines until now). Please let me know which solution you’d prefer here.
bye, and thanks for making me aware of this,
Marco