Hi.
I have been reading about the different memory types such as global, constant, local and private.
Is there a way with JOCL to get the local and private memory available?
This will make it easier to optimize the kernel by using the correct memory.
Thanks
//Fredrik
In all details, this is a rather broad topic. But in general, you can use the same memory types (and in the same way) as in plain OpenCL. The most important destinction is beween “global” and “local”, I guess (“private” is just the default, and “constant” is something I’ve rarely seen until now - though it certainly has its application cases).
I just looked over the Samples again, and was a bit surprised that there indeed is only one “simple” sample that uses local memory, namely
http://jocl.org/samples/JOCLReduction.java
with this kernel:
http://jocl.org/samples/reduction.cl
As you can see, the __local
memory is just declared as a kernel parameter, and it is “allocated” by passing null
as a kernel argument, but with a certain size - e.g.
clSetKernelArg(kernel, argumentIndex, sizeOfLocalMemory, null);
I should probably add samples for other memory types as well. (E.g. I think there currently is no sample of using constant memory).
Hi.
My question was more about how to get the free memory (like local and private) available on the device.
But thanks for the other info
//Fredrik
Ah, I see, sorry. The device query sample at http://jocl.org/samples/JOCLDeviceQuery.java shows some information here. For example, it prints the
CL_DEVICE_GLOBAL_MEM_SIZE
CL_DEVICE_LOCAL_MEM_SIZE
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE
I’m not aware of determining the “private” memory size, but this should not be necessary. I would have to refresh and re-read a few things here, but think that private memory eventually refers to “registers”, and (in the worst case) spills out into global memory - so there should be no relevant limit for that, but of course, using too much of it will cause the GPU to run out of registers and become horribly slow. For NVIDIA cards, it might even be possible to determine the register usage (via PTX), but I don’t think that there is a vendor-independent way to do that.
Perfect, thanks
Seems like the private mem should be handle with care(not overused).