Random CL_OUT_OF_RESOURCES crash

Xor · 18. April 2012 um 00:51

The openCL documentation says the following:

**CL_OUT_OF_RESOURCES if there is a failure to queue the execution instance of kernel on the command-queue because of insufficient resources needed to execute the kernel. For example, the explicitly specified local_work_size causes a failure to execute the kernel because of insufficient resources such as registers or local memory. Another example would be the number of read-only image args used in kernel exceed the CL_DEVICE_MAX_READ_IMAGE_ARGS value for device or the number of write-only image args used in kernel exceed the CL_DEVICE_MAX_WRITE_IMAGE_ARGS value for device or the number of samplers used in kernel exceed CL_DEVICE_MAX_SAMPLERS for device.

CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL implementation on the device.**

The problem I am having with this strange occurence of the error in my code is that it happens at random on a random kernel each time all throughout my code.

Sometimes it does crash and sometimes it doesn’t and it’s really driving me mad.

Here are some system specs:

Nvidia Geforce GT 130M
Version: OpenCL 1.0 CUDA
Driver 296.10
(I got this information using GEEKS 3D GPU Caps viewer)

I do not run out of memory. I always release my memory like it should.
It seems like it also crashes more often when running multiple test cases in a row. But sometimes it also just crashes on the first case.

The error is a CL_OUT_OF RESOURCES like stated earlier and the line that the error points to is clEnqueueNDRangeKernel.

In what direction should I look to solving this error?
Could it be hardware related? Driver related?

Thanks in advance

Xor

Marco13 · 18. April 2012 um 11:51

Hi

Unfortunately, this is one of the most unspecific errors, and really hard to debug. Not only due to the second paragraph of the Doc (nobody knows what exactly that means, or what it could mean). I’ve also experienced this error in weird situations. For example, when accessing an array outside of its bounds, one (or even seveal) kernel calls seemed to work, but then it hung up completely for 20 seconds or so (meaning that even the mouse cursor did not move!) and then it spilled out some last “CL_OUT_OF_RESOURCES” before exiting the application.

However, at least I can say that until now I only received this error due to programming errors (that seemed to be unrelated to the things mentioned in the Doc). So it’s most likely not related to the driver or hardware.

I can only recommend to carefully review the kernel code, especially for writes to global memory that might be out of bounds. One “trick” that I occasionally used: I commented out ALL writes to global memory from the kernel, and then re-inserted them step by step… It seems helpless, but debugging GPU programs can be a hassle. (I did some basic tests of http://www.gremedy.com/ , and it at least seems to be possible to use it with Java and JOCL, but have not used it extensively, so I can not give any further or more specific advice).

If your program is not tooo large or complex, you may consider posting it here or mailing it to me, although I can not exactly say when I will have the time to have a look at it.

bye
Marco

Xor · 20. April 2012 um 03:19

I’m trying to narrow down the problem to the kernels that may cause the problem. (It will probably take a while because it is a hell of a job…)
Once I’m done i will post (or mail) the openCL kernels and the Java host code.

Also how is that gDEBugger being used? I can’t make it work on my system.

Marco13 · 20. April 2012 um 07:27

It’s probably best to have some sort of “Unit Tests” for the kernels - not necessarily with JUnit or so, but just a small “stub” that dedicatedly runs one kernel on predefined input data, if possible.

Concerning gDEBugger: Again, I only did a basic test: In order to run a Java application, you have to create a .BAT file that starts the Java application via the command line. Hopefully I’ll one day have more time to write a short tutorial about that, but ATM there are other things on top of my ‘todo’ list…

Xor · 23. April 2012 um 00:50

I’ve been pretty busy with the issue lately and I think it may be a synchronisation issue.

When I ran the code on a much better notebook of a collegue of mine with a much more powerful gpu and cpu the code did not crash but parts of the images were just not processed so it seems.

So I have these strange conclusions and I don’t exactly know what to think of it:

My notebook --> code runs and has correct outputs but takes much longer and crashes WAAAY more often.
Collegues notebook --> code runs has wrong outputs but goes way faster and crashes almost never, sometimes it throws a memobject allocation failure.

Could it be because my cpu and gpu creates bottlenecks in dataprocessing that I don’t have these gaps in my images, but that it crashes because it queues up too much or something like that? And that it does access memory that’s not his too access because the gpu or cpu can’t follow.
And could that also be why the notebook of my collegue can wrong the code entirely without crashing because it processes the code fast enough so that wrong memory access can’t happen?

Also if you have a very good sample/example of how to do synchronisation in JOCL could you tell me the name of the file.
I imagine it can be possible to let the next kernel wait until the one that is still running completes his task.

Anyway again thanks in advance

Xor

Marco13 · 23. April 2012 um 01:24

Well, … the number of crashes and the speed of the GPU should hardly be related. And more importantly: When it crashes once, there is something wrong.

In general, you can wait for a command to be executed, either fine-grained, using events, or by just writing
clFinish(commandQueue);
It will block until the command queue has finished its work.

Again, for me it sounds like a problem in the kernel, but of course, until now, this is just a guess…

Xor · 24. April 2012 um 00:00

I ran the code for 200 times in a loop on my pc at home (it has some quite decent hardware and everything), that means that i ran 3000 images through all of the kernels one after the other. And no crash, the temptation to blaming my laptop’s hardware is getting bigger and bigger.

Anyway, i made sure the code had no synchronisation issues by making the command queue an IN-ORDER queue and I ran all the kernels individually for a very big amount of time whilst monitoring the memory usage and there is no memory leak anywhere in the code.

I also don’t think it’s possible that the kernels are bad because if they are, then the code shouldn’t run in the first place right?

As a last resort maybe i could copy my .cl kernels here to let you have a quick look at them, they aren’t that difficult.

Let me know and thanks in advance

Xor

Marco13 · 26. April 2012 um 08:34

Hello

Well, I mentioned above, that I sometimes experienced “OUT OF RESOURCES” errors after running a kernel with an out-of-array-access - and it had to run several times, and did not crash during the first run. But I also mentioned that the Documentation of OUT OF RESOURCES" is fuzzy enough so that nearly anything could be the reason.
If you send me your kernels (with some “testing stub” wrapped around them) I can try to run them and see if I get the same error or find something suspicious in the code, but of course, I can not promise anything.

bye
Marco