Cl.exe not found in PATH

I have added “C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin” this path in PATH and added .dll also, but still I got this exception can any one help to sort it out?
what i have done is ( copied simple prog from


import jcuda.*;
import jcuda.driver.*;

public class JCudaVectorAdd
    public static void main(String args[]) throws IOException
        // Enable exceptions and omit all subsequent error checks

        // Create the PTX file by calling the NVCC
        String ptxFileName = preparePtxFile("C://Users//590943//workspace//Assignments//");

        // Initialize the driver and create a context for the first device.
        CUdevice device = new CUdevice();
        cuDeviceGet(device, 0);
        CUcontext context = new CUcontext();
        cuCtxCreate(context, 0, device);

        // Load the ptx file.
        CUmodule module = new CUmodule();
        cuModuleLoad(module, ptxFileName);

        // Obtain a function pointer to the "add" function.
        CUfunction function = new CUfunction();
        cuModuleGetFunction(function, module, "add");

        int numElements = 100000;

        // Allocate and fill the host input data
        float hostInputA[] = new float[numElements];
        float hostInputB[] = new float[numElements];
        for(int i = 0; i < numElements; i++)
            hostInputA** = (float)i;
            hostInputB** = (float)i;

        // Allocate the device input data, and copy the
        // host input data to the device
        CUdeviceptr deviceInputA = new CUdeviceptr();
        cuMemAlloc(deviceInputA, numElements * Sizeof.FLOAT);
            numElements * Sizeof.FLOAT);
        CUdeviceptr deviceInputB = new CUdeviceptr();
        cuMemAlloc(deviceInputB, numElements * Sizeof.FLOAT);
            numElements * Sizeof.FLOAT);

        // Allocate device output memory
        CUdeviceptr deviceOutput = new CUdeviceptr();
        cuMemAlloc(deviceOutput, numElements * Sizeof.FLOAT);

        // Set up the kernel parameters: A pointer to an array
        // of pointers which point to the actual values.
        Pointer kernelParameters =

        // Call the kernel function.
        int blockSizeX = 256;
        int gridSizeX = (int)Math.ceil((double)numElements / blockSizeX);
            gridSizeX,  1, 1,      // Grid dimension
            blockSizeX, 1, 1,      // Block dimension
            0, null,               // Shared memory size and stream
            kernelParameters, null // Kernel- and extra parameters

        // Allocate host output memory and copy the device output
        // to the host.
        float hostOutput[] = new float[numElements];
        cuMemcpyDtoH(, deviceOutput,
            numElements * Sizeof.FLOAT);

        // Verify the result
        boolean passed = true;
        for(int i = 0; i < numElements; i++)
            float expected = i+i;
            if (Math.abs(hostOutput** - expected) > 1e-5)
                    "At index "+i+ " found "+hostOutput**+
                    " but expected "+expected);
                passed = false;
        System.out.println("Test "+(passed?"PASSED":"FAILED"));

        // Clean up.

     * The extension of the given file name is replaced with "ptx".
     * If the file with the resulting name does not exist, it is
     * compiled from the given file using NVCC. The name of the
     * PTX file is returned.
     * @param cuFileName The name of the .CU file
     * @return The name of the PTX file
     * @throws IOException If an I/O error occurs
    private static String preparePtxFile(String cuFileName) throws IOException
        int endIndex = cuFileName.lastIndexOf('.');
        if (endIndex == -1)
            endIndex = cuFileName.length()-1;
        String ptxFileName = cuFileName.substring(0, endIndex+1)+"ptx";
        File ptxFile = new File(ptxFileName);
        if (ptxFile.exists())
            return ptxFileName;

        File cuFile = new File(cuFileName);
        if (!cuFile.exists())
            throw new IOException("Input file not found: "+cuFileName);
        String modelString = "-m"+System.getProperty("");
        String command =
            "nvcc " + modelString + " -ptx "+
            cuFile.getPath()+" -o "+ptxFileName;

        Process process = Runtime.getRuntime().exec(command);

        String errorMessage =
            new String(toByteArray(process.getErrorStream()));
        String outputMessage =
            new String(toByteArray(process.getInputStream()));
        int exitValue = 0;
            exitValue = process.waitFor();
        catch (InterruptedException e)
            throw new IOException(
                "Interrupted while waiting for nvcc output", e);

        if (exitValue != 0)
            System.out.println("nvcc process exitValue "+exitValue);
            throw new IOException(
                "Could not create .ptx file: "+errorMessage);

        System.out.println("Finished creating PTX file");
        return ptxFileName;

     * Fully reads the given InputStream and returns it as a byte array
     * @param inputStream The input stream to read
     * @return The byte array containing the data from the input stream
     * @throws IOException If an I/O error occurs
    private static byte[] toByteArray(InputStream inputStream)
        throws IOException
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        byte buffer[] = new byte[8192];
        while (true)
            int read =;
            if (read == -1)
            baos.write(buffer, 0, read);
        return baos.toByteArray();


And I am getting exception :-

nvcc -m32 -ptx C:\Users\590943\workspace\Assignments\ -o C://Users//590943//workspace//Assignments//JCudaVectorAddKernel.ptx
nvcc process exitValue 1

nvcc fatal : Cannot find compiler ‘cl.exe’ in PATH

Exception in thread “main” Could not create .ptx file:
at JCudaVectorAdd.preparePtxFile(
at JCudaVectorAdd.main(

Some of the environment variables required for finding cl.exe should be set during the Visual Studio installation.

It might be (for whatever reason) that they are not set properly in your case.

By the way: This error is not uncommon - you probably already found these links:
nvidia - Error compiling CUDA from Command Prompt - Stack Overflow
cuda - nvcc fatal : Cannot find compiler ‘cl.exe’ in PATH although Visual Studio 12.0 is added to PATH - Stack Overflow
windows - nvcc fatal : Compiler ‘cl.exe’ in PATH different than the one specified with -ccbin - Stack Overflow

I’m not exactly sure which settings are required here, but there are at least two possible approaches to get it running:

Approach 1:

You mentioned that you already added
C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin
to your PATH.

You could try to add
C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\amd64
instead, because this directory contains the 64bit-version of cl.exe

Approach 2:

There are several .BAT files in Visual Studio that are supposed to set the required environment variables. You could try to execute this BAT file:
C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\x86_amd64\vcvarsx86_amd64.bat

(There are other vcvars-BAT files as well, but this should be the right one)

Certainly not related to the error that you’re facing, but to spare you another one: The path string


does not seem right. Forward slashes need not, cannot and must not be escaped by doubling them up. Use the single ones.

Some conflitions between multiple path was there : “C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin” this one worked fine Thank you.

After solving the .ptx problem, I am still not able to run this code having exception : CUDA_ERROR_NO_BINARY_FOR_GPU
As I am using eclipse for JCUDA, I can not find compute compatibilty(as I found somewhere conversation on this ).
Here is my console :-

nvcc -m32 -ptx C:\Users\590943\workspace\Assignments\ -o C://Users//590943//workspace//Assignments//JCudaVectorAddKernel.ptx
Finished creating PTX file
Exception in thread „main“ jcuda.CudaException: CUDA_ERROR_NO_BINARY_FOR_GPU
at jcuda.driver.JCudaDriver.checkResult(
at jcuda.driver.JCudaDriver.cuModuleLoad(
at JCudaVectorAdd.main(

If you are the same person as the one who wrote the other posts:
then please try to focus on one topic, otherwise I don’t know what I should write where, and I don’t know what your actual question is, and what exactly works or does not work.

Please run this example:
And post its output here. It will contain the version number of the driver, and show the compute capability of your graphics card.
This could help to figure out why the binary cannot be loaded.

(@Marco someone else is also browsing this thread)
I executed that example, here is an output : -

Found 1 devices
Device 0: GeForce 210 with Compute Capability 1.2
Maximum number of threads per block : 512
Maximum x-dimension of a block : 512
Maximum y-dimension of a block : 512
Maximum z-dimension of a block : 64
Maximum x-dimension of a grid : 65535
Maximum y-dimension of a grid : 65535
Maximum z-dimension of a grid : 1
Maximum shared memory per thread block in bytes : 16384
Total constant memory on the device in bytes : 65536
Warp size in threads : 32
Maximum pitch in bytes allowed for memory copies : 2147483647
Maximum number of 32-bit registers per thread block : 16384
Clock frequency in kilohertz : 1238000
Alignment requirement : 256
Number of multiprocessors on the device : 2
Whether there is a run time limit on kernels : 1
Device is integrated with host memory : 0
Device can map host memory into CUDA address space : 1
Compute mode : 0
Maximum 1D texture width : 8192
Maximum 2D texture width : 65536
Maximum 2D texture height : 32768
Maximum 3D texture width : 2048
Maximum 3D texture height : 2048
Maximum 3D texture depth : 2048
Maximum 2D layered texture width : 8192
Maximum 2D layered texture height : 8192
Maximum layers in a 2D layered texture : 512
Alignment requirement for surfaces : 256
Device can execute multiple kernels concurrently : 0
Device has ECC support enabled : 0
PCI bus ID of the device : 1
PCI device ID of the device : 0
Device is using TCC driver model : 0
Peak memory clock frequency in kilohertz : 533000
Global memory bus width in bits : 64
Size of L2 cache in bytes : 0
Maximum resident threads per multiprocessor : 1024
Number of asynchronous engines : 1
Device shares a unified address space with the host : 0
Maximum 1D layered texture width : 8192
Maximum layers in a 1D layered texture : 512
PCI domain ID of the device : 0

Well, that’s a bit diffucult then. The Compute Capability is 1.2. I’m not even sure whether this is still supported by the NVIDIA toolchain (I’d have to try this out - At least, the default value for NVCC is 2.0)

You can try to specify the desired target architecture using a command line parameter for NVCC. So you can either pass
to the NVCC when you’re compiling at the command line, or (when you want to use the “preparePtxFile” method), you can change

        String command =
            "nvcc " + modelString + " -ptx "+
            cuFile.getPath()+" -o "+ptxFileName;


        String command =
            "nvcc -arch=sm_12 " + modelString + " -ptx "+
            cuFile.getPath()+" -o "+ptxFileName;

(it was line 137 in the code of the first post).

If this does not work, you might have to use an older CUDA version for your card - but let’s see whether we get it running like that…

sorry marco, it’s been a lengthy conversation.
I tried your suggestion but it is showing the same exception i.e.** jcuda.CudaException: CUDA_ERROR_NO_BINARY_FOR_GPU**
I checked in ptx file for sm it is generating :
.version 4.1
.target sm_20
.address_size 32 (so may be silly but i tried with " -arch=sm_20" not working)

OK, there are at least two issues:

  1. According to the release noted of CUDA 6.5 (!), at, the 1.2 target architecture is deprecated:

The sm_10 architecture is deprecated within the CUDA Driver, and the sm_11, sm_12, and sm_13 architectures are deprecated within the CUDA Toolkit and the CUDA Driver

So if you want to compile CUDA programs for your Graphics Card, you may have to install an old CUDA version (<6.5). At least, I think that you have do do that - I don’t know whether there is any other way to compile for CC 1.2…

  1. Equally important
    .address_size 32
    This should not be the case. (and sorry: I should have noticed this in your first post already! - I overlooked this…). This would actually mean that you have a 32bit JVM installed, but …this can hardly be the case, because you should not even be able to use the current DLLs then…

You could try to compile it manually for 64bit, with
nvcc -m64 -ptx C:\Users\590943\workspace\Assignments\ -o C://Users//590943//workspace//Assignments//JCudaVectorAddKernel.ptx
but I still wonder where the „m32“ came from. What does
java -version
print for you?

As you suggest I will try with older version (<6.5),I also tried manually with -m64 and -m32 the results are attached here(.ptx is generating successfully).

But can you please tell me what next I should do to run the code manually ,I tried** javac -cp “.;jcuda-0.3.2a.jar”**,

but It is giving errors like : error: package jcuda.driver does not exist
import static jcuda.driver.JCudaDriver.*; and many other.

Last one, As you suggested Are they solvable in older versions ( < CUDA Toolkit 6.5)?

thank you.

The attachment did not work (it’s a bit odd here in the forum). You may just copy+past the PTX file (in `

` tags).

But can you please tell me what next I should do to run the code manually ,I tried** javac -cp ".;jcuda-0.3.2a.jar"**,

Oh dear. Use the name of the right JAR file. That's a version number. (Likely jcuda-0.7.5b.jar in your case)