JOCLSample max size of N

Hi

I was experimenting with setting the size of N in the sample JOCLSample.

My device details using JOCLDeviceQuery are pasted below and illustrate my device supports 1024x1024x64 thread blocks. Thus, I would assume a maximum id index of 1024x1024x64=67,108,864.

However, I find that if set N to this number (any even less than this number) it throws a CLException exception at runtime stating that there are insufficient resources.

Now, the indices upto to 67,108,864 should be valid - otherwise how would we know what the upper limit is. And JOCLSample pushes 3 integer arrays onto the gpu, which presumably is 3x67,108,864x4(bytes)=201,326,592bytes, which is within the specified memory allocation.

Thus, why does the program thrown an exception?

Thanks

Graham

Number of devices in platform NVIDIA CUDA: 1
— Info for device Quadro K1000M: —
CL_DEVICE_NAME: Quadro K1000M
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DRIVER_VERSION: 297.03
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
CL_DEVICE_MAX_COMPUTE_UNITS: 1
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 1024 / 1024 / 64
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
CL_DEVICE_MAX_CLOCK_FREQUENCY: 405 MHz
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 512 MByte

Hello Graham,

It sounds like you’re interpreting the MAX_WORK_ITEM_SIZES as a strict limitation for the problem size. But these refer to problems where you specify the work size in multiple dimensions. And even there they don’t refer to the problem size itself, but only to the size of the work-groups.

In fact, there should be no limit for the problem size, except the limits imposed by memory consumption etc.

Which work group size did you specify? Note that the JOCLSample specifies a local_work_size of 1, which may not be appropriate here. You can just set the local_work_size to ‘null’, in order leave the decision about the local work size to the OpenCL implementation. (This is probably the most approapriate choice in cases where you are not going to use local memory). (Admittedly, I think that setting the work group size to 1 should theoretically NOT impose any limitations, but it might do, due to limitations of the number of work-groups on NVIDIA cards).

I can’t run tests here ATM (currenly in front of a really old PC, without OpenCL), but if your question referred to finding out the maximum size of vectors that can be added in the JOCLSample, I can try this out on Monday.

bye
Marco

Thanks for your prompt reply.

I did as you suggested and changed:

clEnqueueNDRangeKernel(commandQueue, kernel, 1, null,
global_work_size, local_work_size, 0, null, null);

to:

    clEnqueueNDRangeKernel(commandQueue, kernel, 1, null,
            global_work_size, null, 0, null, null);

and then set N to 100mn and it didn’t throw the exception.

Graham