Questions about: jcuda.Pointer.withByteOffset(byteOffset)

djmj · 7. September 2011 um 10:57

I have a few questions about the method, questions are in comments in code!

jcuda.Pointer.withByteOffset(byteOffset)

Example:

//device pointer holding integer array with 8 values

//0 1 2 3 4 5 6 7
Pointer ints_d;

//4 5 6 7
Pointer intsLastFour_d = ints_d.withByteOffset(Sizeof.INT * 4);

//results in freeing both pointers ?
//0 0 0 0 0 0 0 0
cudaFree(ints_d);

//what happens if I free the other one instead?
//0 1 2 3 0 0 0 0
cudaFree(intsLastFour_d);


//how do I get the first 4 numbers ?, would be much more usefull
//0 1 2 3

how do I get the first bytes from 0-x, and not from x-end

djmj · 7. September 2011 um 11:35

Hmm writing operations on the returned pointer does not seem to work the way I thought

pseudo-code


//0 1 2 3 4 5 6 7
//device pointer with array
Pointer ints_d;

Pointer intsFirstFour_d = ints_d.withByteOffset(Sizeof.Int * 4);
//reset offset pointer
JCuda.memSet(intsFirstFour_d, 0, 4);


JCuda.cudaMemcpy(ints_d);

//wished output
//0 0 0 0 4 5 6 7

//output
//0 1 2 3 4 5 6 7

If input and output pointer point to same memory adress, why dont I see any changes ?

Marco13 · 7. September 2011 um 11:42

Hi

Freeing a device pointer with an offset should have has the same effect as it has in CUDA-C, if you did something like
cudaFree(somePointer+4);
which means that it should cause a “cudaErrorInvalidDevicePointer” error.

Concerning the question about obtaining the first elements of a pointer… maybe this is a misunderstanding, but in C there is no such concept like an “array length”. Pointers do not know how large the (valid) memory region is that they are pointing to. So, according to your example: The pointer to the 8 elements is in C identical to a pointer to the first 4 elements - the pointer is pointing to the same memory location (namely, to the beginning of the “array”) and does not know whether there are 4, 8 or 1000 “valid” elements beyond this location.

You are most likely asking these questions because you want to “reduce” the size of a CUDA array at runtime, step by step, right? This is not possible. The only solution would be to create a new, smaller array and copy the first elements from the large one into the small one. But of course, this would take time and might be unnecessary: If the memory consumption is not absolutely critical, and if you do not absolutely need the “unused” memory in order to allocate new arrays, you should probably only allocate the array with its maximum size once in the beginning, and then, in later steps, not “make the array smaller”, but only store the current length, i.e. maintain a variable that says how many elements of the large array are currently really used. I can hardly imagine how the algorithm should work without this information, so you most likely already have something like this. And this value may, by the way, also be the value that says how many elements may have to be copied back to the host in the end.

Marco13 · 7. September 2011 um 11:44

Concerning the second post, I’ll test this ASAP.

djmj · 7. September 2011 um 12:12

Thanks for your quick answer, yes i am aware that a pointer points to the first element.

Oh, my bad of cause wit the offset the pointer just changes its starting point.
Dont know what I was thinking.

And even my testing was wrong.

This thread is completely useless, just my stupidness.
I have conjunctivitis since today, disturbing me right now pretty much, so not a good day to write code.

With some other arrays, I already keep them in device memory, and use a length variable to check their sizes. Not my day

Marco13 · 7. September 2011 um 12:19

Oh well, however, here’s what I wrote as a test

import static jcuda.runtime.JCuda.*;

import java.util.Arrays;

import jcuda.*;
import jcuda.runtime.*;

public class ByteOffsetTest
{
    public static void main(String args[])
    {
        JCuda.setExceptionsEnabled(true);
        int size = 8;
        int a[] = new int[size];
        for (int i=0; i<size; i++)
        {
            a** = i;
        }
        
        Pointer dA = new Pointer();
        cudaMalloc(dA, size * Sizeof.INT);
        cudaMemcpy(dA, Pointer.to(a), size * Sizeof.INT, 
            cudaMemcpyKind.cudaMemcpyHostToDevice);
        
        System.out.println("After filling device memory");
        printArray(dA, size);

        cudaMemset(dA, 0, 2 * Sizeof.INT);
        
        System.out.println("After memset of first two elements");
        printArray(dA, size);

        Pointer withOffset = dA.withByteOffset(6 * Sizeof.INT);
        cudaMemset(withOffset, 0, 2 * Sizeof.INT);
        
        System.out.println("After memset of last two elements using offset pointer");
        printArray(dA, size);
        
        cudaFree(dA);
    }
    
    private static void printArray(Pointer dA, int size)
    {
        int a[] = new int[size];
        cudaMemcpy(Pointer.to(a), dA, size * Sizeof.INT, 
            cudaMemcpyKind.cudaMemcpyDeviceToHost);
        System.out.println(Arrays.toString(a));
    }
}