I have CUDA 3.2, Windows Server 2008 R2 and 64 bits. I get the following error when running the JCudaDriverCubinSample. I do have VS2008 installed but for some reason nvcc is not finding cl.exe, how could I fix this?
It’s difficult to guess the possible reasons and solutions for this problem. The CL.EXE should already have been added to the PATH environment variable during the VS installation, and if this did not happen automatically, chances are high that something went wrong during the installation and it’s also missing other Environment Variables…
So some possible approaches, ordered roughly by effort/probability of solving the problem:
You might first want to try running the ‘vsvars32.bat’ (i’m not sure if it’s called vsvars64.bat on a 64bit system…) : Info about vsvars.
If this does not help, you may add the path to CL to the PATH environment variable.
A websearch on the error message brings many results, maybe there are some further hints or ideas.
If all this does not help, a re-installation of VS might be worth a try, if this is not too much effort…
I update this 3d hoping that all my efforts could be useful for some else too… ^^
since I am migrating my machine from an xp 32 to 7 64 bits (hoping to solve some problem allocating large buffers) I tried to have a fully working developing platform by installing just:
Netbeans 7.1
VS express 2010
Unfortunately as long as I tried to compile in 64 bits (in 32 no problems) I was bored by this error:
nvcc fatal : Visual Studio configuration file ‚(null)‘ could not be found for installation at ‚C:/Program Files (x86)/Microsoft Visual Studio 10.0/VC/bin/…/…‘
Then I looked online, but the most of results were for VS prof, and not the Express edition (which is free)
But at the end I found this page where at the end, there is the right answer…
Hm yes, I think I remember a similar problem, that I had to copy around some files between the VS directories… But maybe the StackOverflow answer is a little bit more focussed and profound…
I’m not sure what you mean. In the latest version (0.0.4) of the “Utilities”, the KernelLauncher is no longer using CUBIN files, but instead, it is using PTX files - there it should not be necessary to add any “sm_…” arguments. Did you encounter problems with this latest version?
Sorry, I’m still not sure if I understood your question right. It should be possible to specify things like the “-prec-sqrt=true” parameter even when PTX files are used - at least according to, for example, this thread on the NVIDIA forum: http://forums.nvidia.com/index.php?showtopic=187739
But admittedly, I did not yet play around with these compiler flags very much.
These flags can be passed to the KernelLauncher when calling the “create”/“compile” methods: They have an optional array of additional “nvccArguments”, and when “-prec-sqrt=true” is inserted there, the PTX file should be created accordingly. If this does not work, I’ll have to check what might be wrong there…
[QUOTE=Marco13]Sorry, I’m still not sure if I understood your question right. It should be possible to specify things like the „-prec-sqrt=true“ parameter even when PTX files are used - at least according to, for example, this thread on the NVIDIA forum:
But admittedly, I did not yet play around with these compiler flags very much.
These flags can be passed to the KernelLauncher when calling the „create“/„compile“ methods: They have an optional array of additional „nvccArguments“, and when „-prec-sqrt=true“ is inserted there, the PTX file should be created accordingly. If this does not work, I’ll have to check what might be wrong there…[/QUOTE]
Then If I am going to use the KernelLauncher I should add them in the nvccArguments:
but what about if I want to inject these option in an already existing example like the JCudaVectorAdd?
Doing something like this in the preparePtxFile:
I think that
modelString += " --ftz=false --prec-div=true --prec-sqrt=true";
(without the ‘arch’) should work. I have not tested it, but can do so if you encounter any problems.
(BTW: These samples should be considered as such: Samples - the ‘preparePtxFile’ method is contained in many of them, for convenience and simplicity for the user, but of course, in a “real” application, one would either pull this method into an own “Utility” class, or maybe use the KernelLauncher or own utility methods…)
[QUOTE=Marco13]I think that
modelString += " --ftz=false --prec-div=true --prec-sqrt=true";
(without the ‚arch‘) should work. I have not tested it, but can do so if you encounter any problems.
(BTW: These samples should be considered as such: Samples - the ‚preparePtxFile‘ method is contained in many of them, for convenience and simplicity for the user, but of course, in a „real“ application, one would either pull this method into an own „Utility“ class, or maybe use the KernelLauncher or own utility methods…)[/QUOTE]
It looks strange to me, because right now I have a 9400 GT (1.1) and if I compile with -arch=sm_10 or -arch=sm_11 it works, but if choose an higher value like -arch=sm_12 or -arch=sm_20 I get the error:
Exception in thread „main“ jcuda.CudaException: CUDA_ERROR_INVALID_SOURCE
Then, it looks like these parameters should be really take in account by the nvcc, but I get difference values if I execute the same calculations (many adds/subs/divisions) on gpu and cpu (max error = 0.99749755859375, it is not that high, but also not that small…) .
Therefore, I guess the problems should come by somewhere else… I found this:
On page 15th they say 1.1 devices have approximate division, so maybe it is something related to the small Compute Capability. Im going to receive a 560 Ti in a (hopefully) few days, 2.1 (I took the vga with the last number of cores among the 2.1 devices, truly there was also the EVGA 2Win 560, a dual 560 Ti, but it cost like hell! not less then 500€, while for mine I spent only 144 shipped!).
I will run some test on this new small monster and let you know
Concerning the general discrepancy between CPU and GPU results (concerning the precision) there are several therads at the NVIDIA forums, and papers and presentations like the one you mentioned. Admittedly, it was not yet too relevant for me (I’m not writing banking software or so ).
But concerning the PTX and -arch-Version: I just had another look at the NVCC manual, and it is more complicated than I remembered: There are „virtual“ version numbers and „code“ version numbers… I guess I’ll have to read the NVCC manual and the PTX manual again to get a clearer view on this…