Binding pitched linear memory to 2D texture

Hi guys,

I want to bind a pitch linear memory to 2D texture so that I could get the memory data using tex2D in the kernel and then modify the memory data. I use the driver API to do this reading/writing on the same pitch memory:

public class TextureTest2{
    private static int sizeY = 2;
    private static int sizeX = 4;
    private static int fltLn = 1;
    public static void main(String[] args){
        // initializing the driver API
        // Load the CUBIN file containing the kernels
        KernelLauncher kl=KernelLauncher.create("d:/GPU/","TextureTest");
        CUdeviceptr p1=new CUdeviceptr();
        long[] pPitch=new long[1];
        // Copy the host input to the array
        float[] data=new float[sizeX*sizeY*fltLn];
        int ptr=0;
        for(int i=0;i<sizeX*sizeY;i++)
        for(int j=0;j<fltLn;j++) data[ptr++]=i;
        CUDA_MEMCPY2D copyHD = new CUDA_MEMCPY2D();
        copyHD.srcMemoryType = CUmemorytype.CU_MEMORYTYPE_HOST;
        copyHD.srcHost =;
        copyHD.srcPitch = sizeX * Sizeof.FLOAT * fltLn;
        copyHD.dstMemoryType = CUmemorytype.CU_MEMORYTYPE_DEVICE;
        copyHD.WidthInBytes = sizeX * Sizeof.FLOAT * fltLn;
        copyHD.Height = sizeY;
        ad.Width =sizeX;
        // Set up the texture reference
        CUtexref texref = new CUtexref();
        cuModuleGetTexRef(texref, kl.getModule(), "texture_float4_2D");
        cuTexRefSetFilterMode(texref, CU_TR_FILTER_MODE_POINT);
        cuTexRefSetAddressMode(texref, 0, CU_TR_ADDRESS_MODE_WRAP);
        cuTexRefSetAddressMode(texref, 1, CU_TR_ADDRESS_MODE_WRAP);
        cuTexRefSetFlags(texref, CU_TRSF_NORMALIZED_COORDINATES);
        cuTexRefSetFormat(texref, CU_AD_FORMAT_FLOAT, fltLn);
        // Prepare the output device memory
        CUdeviceptr dOutput = CUDAUtil.newDeviceData1D(new float[sizeY*sizeX*fltLn]);
        kl.setup(new dim3(1,1,1),new dim3(sizeX,sizeY,1));,p1);
        float[] hOutput=new float[sizeX*sizeY*fltLn];
        cuMemcpyDtoH(, dOutput, sizeX*sizeY*Sizeof.FLOAT * fltLn);
        // Print the results
        // Clean up

The following is the kernel code:

texture<float, 2, cudaReadModeElementType> texture_float4_2D;
extern "C"
__global__ void TextureTest(float* output,float* in){
 int WIDTH =4;
 int HEIGHT=2;
 int tx=threadIdx.x+blockIdx.x*blockDim.x;
 int ty=threadIdx.y+blockIdx.y*blockDim.y;
  output[ty*WIDTH+tx]=tex2D(texture_float4_2D, tx, ty)+1;

The output of the program is like:
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
not like expected:
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]

It seems that the tex2D in the kernel do not return the value I want. Maybe the binding is not successful but I cannot find anything wrong in the code.

Please give me some help. Thanks very much.


I’ll try to extend this to a compileable example and test it in the next few days. But from quickly looking over it: You are using CU_TRSF_NORMALIZED_COORDINATES, which means that the texture coordinates are in the range [0.0, 1.0), but you are obviously accessing it with [0.0, dim) … together with the CLAMPing, this might explain the unexpected result.


Hi Marco13, thanks for your reply. I’ve done several tests with different parameters but none of them return the right result. This test program is a little ‘strange’ but why tex2D do not return anything but only zero is most strange to me. You can modify the kernel to:

__global__ void TextureTest(float* output,float* in){
output[ty*WIDTH+tx]=tex2D(texture_float4_2D, **[U]1[/U]**, **[U]1[/U]**)+1;

or any number that is valid so that tex2D could get something other than zero.

I’ve been searching the answer for quite a time. Example from Nvidia website uses runtime API which is different from the driver API. What ever, thanks again.


I did a quick test, I modified a few lines to replace the CUDAutil, but I think that the only things that were necessary to obtain the right result was

  • Remove
    (as I already mentioned)

  • Add
    copyHD.dstPitch = pPitch[0];
    to take the pitch into account

Hope that helps, otherwise please try to describe you current code, your intention, and the expected vs. observed behavior.