I’m doing a little research about how OpenCL performs on CPU and I just noticed new release of intel OpenCL sdk so I had to check how It works. Anyway my results:
First of all it works out of box(on windows at least) so note about jocl not tested on intel openCL can be removed.
Second thing is that it works much faster than AMD CPU runtime because it uses SSE4.1, while AMD’s runtime uses SSE2 only. Also it provides support for doubles(which AMD runtimes doesn’t support on my CPU). Device query results:
Number of platforms: 3
Number of devices in platform NVIDIA CUDA: 1
Number of devices in platform AMD Accelerated Parallel Processing: 1
Number of devices in platform Intel(R) OpenCL: 1
--- Info for device GeForce GT 420M: ---
CL_DEVICE_NAME: GeForce GT 420M
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DRIVER_VERSION: 275.27
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
CL_DEVICE_MAX_COMPUTE_UNITS: 2
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 0 / 0 / 0
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1000 MHz
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 240 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 961 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: local
CL_DEVICE_LOCAL_MEM_SIZE: 48 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8
CL_DEVICE_SINGLE_FP_CONFIG: CL_FP_DENORM CL_FP_INF_NAN CL_FP_ROUND_TO_NEAREST CL_FP_ROUND_TO_ZERO CL_FP_ROUND_TO_INF CL_FP_FMA
CL_DEVICE_2D_MAX_WIDTH 0
CL_DEVICE_2D_MAX_HEIGHT 0
CL_DEVICE_3D_MAX_WIDTH 0
CL_DEVICE_3D_MAX_HEIGHT 0
CL_DEVICE_3D_MAX_DEPTH 0
CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t> CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1, DOUBLE 1
--- Info for device Intel(R) Core(TM) i3 CPU M 380 @ 2.53GHz : ---
CL_DEVICE_NAME: Intel(R) Core(TM) i3 CPU M 380 @ 2.53GHz
CL_DEVICE_VENDOR: GenuineIntel
CL_DRIVER_VERSION: 2.0
CL_DEVICE_TYPE: CL_DEVICE_TYPE_CPU
CL_DEVICE_MAX_COMPUTE_UNITS: 4
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 0 / 0 / 0
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
CL_DEVICE_MAX_CLOCK_FREQUENCY: 2527 MHz
CL_DEVICE_ADDRESS_BITS: 64
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 2048 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 3958 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: global
CL_DEVICE_LOCAL_MEM_SIZE: 32 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8
CL_DEVICE_SINGLE_FP_CONFIG: CL_FP_DENORM CL_FP_INF_NAN CL_FP_ROUND_TO_NEAREST CL_FP_ROUND_TO_ZERO CL_FP_ROUND_TO_INF
CL_DEVICE_2D_MAX_WIDTH 0
CL_DEVICE_2D_MAX_HEIGHT 0
CL_DEVICE_3D_MAX_WIDTH 0
CL_DEVICE_3D_MAX_HEIGHT 0
CL_DEVICE_3D_MAX_DEPTH 0
CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t> CHAR 16, SHORT 8, INT 4, LONG 2, FLOAT 4, DOUBLE 0
--- Info for device Intel(R) Core(TM) i3 CPU M 380 @ 2.53GHz : ---
CL_DEVICE_NAME: Intel(R) Core(TM) i3 CPU M 380 @ 2.53GHz
CL_DEVICE_VENDOR: Intel(R) Corporation
CL_DRIVER_VERSION: 1.1
CL_DEVICE_TYPE: CL_DEVICE_TYPE_CPU
CL_DEVICE_MAX_COMPUTE_UNITS: 4
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 0 / 0 / 0
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
CL_DEVICE_MAX_CLOCK_FREQUENCY: 2530 MHz
CL_DEVICE_ADDRESS_BITS: 64
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 989 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 3958 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: global
CL_DEVICE_LOCAL_MEM_SIZE: 32 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 128 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 128
CL_DEVICE_SINGLE_FP_CONFIG: CL_FP_DENORM CL_FP_INF_NAN CL_FP_ROUND_TO_NEAREST
CL_DEVICE_2D_MAX_WIDTH 0
CL_DEVICE_2D_MAX_HEIGHT 0
CL_DEVICE_3D_MAX_WIDTH 0
CL_DEVICE_3D_MAX_HEIGHT 0
CL_DEVICE_3D_MAX_DEPTH 0
CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t> CHAR 16, SHORT 8, INT 4, LONG 2, FLOAT 4, DOUBLE 2
To test its speed I used about 10 milion combinations calculated with function that uses a lot of pows and sqrt and few integrals of those. This test results:
Intel OpenCL - i3 380M - 104 seconds
AMD OpenCL - i3 380M - 132 seconds
Nvidia OpenCL - GT420M - 24 seconds
Intel runtime is a little faster here but I think it will depend on the task and It’s always better having the double and SSE4.1 support than not.
I’ll try to find the way to install the runtime on my ubuntu and check how it works. If you’ll find anything interesting in this device query result and want me to check anything let me know how