AMD APP + NVIDIA + CPU + DOUBLE PRECISION problem

kacperpl1 · 17. Mai 2011 um 14:28

Hi again! I found a new interesting problem/bug(maybe)

Finally Nvidia released some drivers that work with amd app with cpu support however i have found an interesting bug trying to force calculations on cpu(for GPU vs CPU testing purposes).

Ok, so you know that I have a lot of code in my cl kernels. As I create the kernel as a string putting some java variables as constant data inside the kernel code I found it problematic for my cpu.

So i have a line like this:

"+```
And I'm getting info from opencl program builder that double constants are not supported so i do this:
```"float Alfa1=(("+Fw/2000+"f*pow(h1, 2))/("+E+"f*J1))+(("+Fw/1000+"f*(h2+h3))*h1/("+E+"f*J1));
"+```
And the problem now is that program builder throws linker error and no more info. It only happens when compiling on cpu.

By google search i found that there is a switch that forces single precision constants:

-cl-single-precision-constant


But how to use it with jocl?

Some info about my platform. Note that: CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t>        CHAR 16, SHORT 8, INT 4, LONG 2, FLOAT 4, **DOUBLE 0** for the CPU and **1** on GPU - does it mean that double precision isnt supported (yet) for my cpu?

Number of platforms: 2
Number of devices in platform NVIDIA CUDA: 1
Number of devices in platform AMD Accelerated Parallel Processing: 1
— Info for device GeForce GT 420M: —
CL_DEVICE_NAME: GeForce GT 420M
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DRIVER_VERSION: 275.27
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
CL_DEVICE_MAX_COMPUTE_UNITS: 2
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 0 / 0 / 0
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1000 MHz
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 240 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 961 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: local
CL_DEVICE_LOCAL_MEM_SIZE: 48 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8
CL_DEVICE_SINGLE_FP_CONFIG: CL_FP_DENORM CL_FP_INF_NAN CL_FP_ROUND_TO_NEAREST CL_FP_ROUND_TO_ZERO CL_FP_ROUND_TO_INF CL_FP_FMA
CL_DEVICE_2D_MAX_WIDTH 0
CL_DEVICE_2D_MAX_HEIGHT 0
CL_DEVICE_3D_MAX_WIDTH 0
CL_DEVICE_3D_MAX_HEIGHT 0
CL_DEVICE_3D_MAX_DEPTH 0
CL_DEVICE_PREFERRED_VECTOR_WIDTH_ CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1, DOUBLE 1

— Info for device Intel® Core™ i3 CPU M 380 @ 2.53GHz : —
CL_DEVICE_NAME: Intel® Core™ i3 CPU M 380 @ 2.53GHz
CL_DEVICE_VENDOR: GenuineIntel
CL_DRIVER_VERSION: 2.0
CL_DEVICE_TYPE: CL_DEVICE_TYPE_CPU
CL_DEVICE_MAX_COMPUTE_UNITS: 4
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 0 / 0 / 0
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
CL_DEVICE_MAX_CLOCK_FREQUENCY: 2527 MHz
CL_DEVICE_ADDRESS_BITS: 64
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 2048 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 3958 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: global
CL_DEVICE_LOCAL_MEM_SIZE: 32 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8
CL_DEVICE_SINGLE_FP_CONFIG: CL_FP_DENORM CL_FP_INF_NAN CL_FP_ROUND_TO_NEAREST CL_FP_ROUND_TO_ZERO CL_FP_ROUND_TO_INF
CL_DEVICE_2D_MAX_WIDTH 0
CL_DEVICE_2D_MAX_HEIGHT 0
CL_DEVICE_3D_MAX_WIDTH 0
CL_DEVICE_3D_MAX_HEIGHT 0
CL_DEVICE_3D_MAX_DEPTH 0
CL_DEVICE_PREFERRED_VECTOR_WIDTH_ CHAR 16, SHORT 8, INT 4, LONG 2, FLOAT 4, DOUBLE 0



EDIT:
My bad - linker problem was resolved after system reboot, but I don't get why code scanner, and device info worked while linker wouldn't...

Still lasts the question about the single precision parameter

Marco13 · 17. Mai 2011 um 15:31

Hello

Yes, the AMD compiler is a little bit more strict than the NVIDIA compiler (also, for example, cocerning these ‚quick comments‘


//*/ 
x=y+u;
//*/

where you switch a block on or off by removing the first ‚/‘)

That’s what this flag means, as far as I know: When the flag is 0, double is not supported - or does it work now? (That was not clear due to the ‚edit‘ - if it works: Does it still print the ‚0‘ for CPU double support?)

Concerning the switch: It should be possible to pass this one to ‚clBuildProgram‘, like
clBuildProgram(program, 0, null, „-cl-single-precision-constant“, null, null);

bye

kacperpl1 · 17. Mai 2011 um 23:21

[QUOTE=Marco13]Hello
That’s what this flag means, as far as I know: When the flag is 0, double is not supported - or does it work now? (That was not clear due to the ‘edit’ - if it works: Does it still print the ‘0’ for CPU double support?)
bye[/QUOTE]
Yeah It does not work - i mean i fixed the linker errors after i fixed double constants with “f” sign to make them single floating point constants.