-> In which form does it crash on AMD platforms?
Windows XP Professional 32-bit, OpenCL.dll (OpenCL 1.0 Runtime), Radeon HD 4670 (Catalyst 10.9)
Start sorting
= = = workgroup size: 128 = = =
program failure
org.jocl.CLException: CL_INVALID_WORK_GROUP_SIZE
at org.jocl.CL.checkResult(CL.java:523)
at org.jocl.CL.clEnqueueNDRangeKernel(CL.java:6868)
at org.jocl.samples.JOCLRadixSort.radixSortBlocksKeysOnlyOCL(JOCLRadixSort.java:318)
at org.jocl.samples.JOCLRadixSort.radixSortStepKeysOnly(JOCLRadixSort.java:289)
at org.jocl.samples.JOCLRadixSort.sort(JOCLRadixSort.java:274)
at org.jocl.samples.JOCLRadixSort.doRadixSort(JOCLRadixSort.java:248)
at org.jocl.samples.JOCLRadixSort.(JOCLRadixSort.java:192)
at org.jocl.samples.JOCLRadixSort.main(JOCLRadixSort.java:620)
Exception in thread “main” java.lang.RuntimeException
at org.jocl.samples.JOCLRadixSort.(JOCLRadixSort.java:207)
at org.jocl.samples.JOCLRadixSort.main(JOCLRadixSort.java:620)
-> Did you compare the speed of this sort implementation to an “Arrays.sort”?
Java 1.6.0_16
java.util.Arrays.sort(intArray);
CPU: Intel Core 2 Quad 9550S (2,8 GHz)
chipset: Intel X48
RAM: 1333 MHz DDR3 CAS Latency (CL) 7
SiSoftware Sandra 2010 Pro Raw Performance (GFLOPS) = 37.70
Start sorting
time: 7.500814 ms
array size: 131072 Byte
array length: 32768 elements
validating…
values are sorted
time: 7.695027 ms
array size: 262144 Byte
array length: 65536 elements
validating…
values are sorted
time: 16.253822 ms
array size: 524288 Byte
array length: 131072 elements
validating…
values are sorted
time: 34.560425 ms
array size: 1048576 Byte
array length: 262144 elements
validating…
values are sorted
time: 72.745705 ms
array size: 2097152 Byte
array length: 524288 elements
validating…
values are sorted
time: 153.41054 ms
array size: 4194304 Byte
array length: 1048576 elements
validating…
values are sorted
time: 318.7354 ms
array size: 8388608 Byte
array length: 2097152 elements
validating…
values are sorted
time: 661.08856 ms
array size: 16777216 Byte
array length: 4194304 elements
validating…
values are sorted
End sorting
Java 1.6.0_16
CPU: NVIDIA GeForce 9400M G
RAM: 128 MB shared memory DDR2 800 CL5
raw performance (GFLOPS) over stream processors (GFlops) = 54
Start sorting
= = = workgroup size: 128 = = =
time: 10.569779 ms
array size: 131072 Byte
array length: 32768 elements
validating…
values are sorted
time: 14.977043 ms
array size: 262144 Byte
array length: 65536 elements
validating…
values are sorted
time: 23.03952 ms
array size: 524288 Byte
array length: 131072 elements
validating…
values are sorted
time: 39.957592 ms
array size: 1048576 Byte
array length: 262144 elements
validating…
values are sorted
time: 76.55637 ms
array size: 2097152 Byte
array length: 524288 elements
validating…
values are sorted
time: 138.79166 ms
array size: 4194304 Byte
array length: 1048576 elements
validating…
values are sorted
time: 273.41605 ms
array size: 8388608 Byte
array length: 2097152 elements
validating…
values are sorted
time: 540.7433 ms
array size: 16777216 Byte
array length: 4194304 elements
validating…
values are sorted
= = = workgroup size: 256 = = =
time: 16.51858 ms
array size: 262144 Byte
array length: 65536 elements
validating…
values are sorted
time: 24.967978 ms
array size: 524288 Byte
array length: 131072 elements
validating…
values are sorted
time: 45.04008 ms
array size: 1048576 Byte
array length: 262144 elements
validating…
values are sorted
time: 75.6733 ms
array size: 2097152 Byte
array length: 524288 elements
validating…
values are sorted
time: 145.25087 ms
array size: 4194304 Byte
array length: 1048576 elements
validating…
values are sorted
time: 274.57568 ms
array size: 8388608 Byte
array length: 2097152 elements
validating…
values are sorted
time: 541.9929 ms
array size: 16777216 Byte
array length: 4194304 elements
validating…
values are sorted
End sorting
-> Could this be added as an example on the jocl.org website?
Yes, you can add it as an example on the jocl.org website. I implement it as a proof of concept. There will be no warranty or maintenances from my side.