Is there anyway to monitor the usage of gpu memory when running kernels?
Like for example the constant memory or the local memory or whatever else there is?
I’m asking this because when we try to run a set of kernels on older cards they crash a lot with CL_OUT_OF_RESOURCES, i know the normal reaction to this error is to check if you have out of bound read or writes intro memory but this could hardly be the case.
The kernels never crash on cpu, they run perfectly on newer gpu’s like the gtx 660Ti, GT640 and an amd radeon hd 7770. At least they didn’t crash there yet, but i’m planning on doing a full on stress test to verify the non crashing newer gpu’s.
Is it possible that you can get overflows by using local, private, constant or any other memory available on the gpu? And that that’s a possible reason why the older cards crash a lot because their memory isn’t that high?
I would like to know if there is a way to track the usage of these specific memory types. Or if there are guidelines specified for using these special types of memory so I can try it out and cross out possible sources of the crash.
There are debuggers and profilers for OpenCL, like gDEBbugger from http://www.gremedy.com/ which now seems to be continued by AMD as http://developer.amd.com/tools/hc/gDEBugger/Pages/default.aspx, or the NVIDIA Visual Profiler. Using these with JOCL is possible to some extent (I intended to write a small „How To“ about that … when I have the time ). However, these tools can not analyze all possible reasons for a card being „out of resources“. I think I already mentioned in another thread that CL_OUT_OF_RESOURCES seems to me (!) like a „standard message“ that appears when anything goes wrong which is not explicitly covered with other error codes -_-
So there are many possible reasons: Unspecified errors (like writing out of memory bounds). Attempts to allocate memory that is too large.
Referring to your use of Images: One reason could be the attempt to allocate/create too many image samplers (or exceeding any other limit that is implied by the values reported by clGetDeviceInfo - see http://jocl.org/samples/JOCLDeviceQuery.java , although this does not query all properties).
Referring to your problems with building some kernels: An attempt to use too much local memory (or maybe even a kernel that runs out of registers) could also cause this error - although your kernels did not seem to use local memory or many registers, so this may be unlikely here.
Some of these reasons could possibly detected beforehand and offline, using the NVIDIA Occupancy Calculator or the AMD Kernel Analyzer.
BTW: You seem to work towards some generic image manipulation/analysis library, right? Admittedly, I’m also not sure about some real practical aspects of using OpenCL: On the one hand, it’s intended to be device-independent, on the other hand, you still always have to query limits, and in the worst case, use different execution paths depending on the results. (Not to mention the question about how to properly handle different OpenCL versions that are supported by different platforms that may be installed simultaneously, but that’s another topic). In that sense, in order to be „perfectly portable“, one probably has to do many device queries and cover all cases, or try to choose or assume the lowest common denominator…