JOCLRadixSort

Hello

I have implemented a proof of concept of NVIDIAs OpenCL RadixSort algorithm with JOCL libaries. It works perfect on NVIDIA ION. But on AMD platforms it fails. You can use it for free experimental issues.

Here is the output:

Start sorting

= = = workgroup size: 128 = = =

time: 10.569779 ms
array size: 131072 Byte
array length: 32768 elements
validating…
values are sorted

time: 14.977043 ms
array size: 262144 Byte
array length: 65536 elements
validating…
values are sorted

time: 23.03952 ms
array size: 524288 Byte
array length: 131072 elements
validating…
values are sorted

time: 39.957592 ms
array size: 1048576 Byte
array length: 262144 elements
validating…
values are sorted

time: 76.55637 ms
array size: 2097152 Byte
array length: 524288 elements
validating…
values are sorted

time: 138.79166 ms
array size: 4194304 Byte
array length: 1048576 elements
validating…
values are sorted

time: 273.41605 ms
array size: 8388608 Byte
array length: 2097152 elements
validating…
values are sorted

time: 540.7433 ms
array size: 16777216 Byte
array length: 4194304 elements
validating…
values are sorted

= = = workgroup size: 256 = = =

time: 16.51858 ms
array size: 262144 Byte
array length: 65536 elements
validating…
values are sorted

time: 24.967978 ms
array size: 524288 Byte
array length: 131072 elements
validating…
values are sorted

time: 45.04008 ms
array size: 1048576 Byte
array length: 262144 elements
validating…
values are sorted

time: 75.6733 ms
array size: 2097152 Byte
array length: 524288 elements
validating…
values are sorted

time: 145.25087 ms
array size: 4194304 Byte
array length: 1048576 elements
validating…
values are sorted

time: 274.57568 ms
array size: 8388608 Byte
array length: 2097152 elements
validating…
values are sorted

time: 541.9929 ms
array size: 16777216 Byte
array length: 4194304 elements
validating…
values are sorted

End sorting

Hello

Thank you, I will have a look at this as soon as possible. Some small questions:
In which form does it crash on AMD platforms?
Did you compare the speed of this sort implementation to an “Arrays.sort”?
Could this be added as an example on the jocl.org website?

Maybe it can be adjusted to offer some utility methods, so that it may either be used as a drop-in replacement of Arrays.sort, or as a simple and convenient sorting routine that can be used in larger programs…

bye
Marco

-> In which form does it crash on AMD platforms?

Windows XP Professional 32-bit, OpenCL.dll (OpenCL 1.0 Runtime), Radeon HD 4670 (Catalyst 10.9)

Start sorting

= = = workgroup size: 128 = = =

program failure

org.jocl.CLException: CL_INVALID_WORK_GROUP_SIZE
at org.jocl.CL.checkResult(CL.java:523)
at org.jocl.CL.clEnqueueNDRangeKernel(CL.java:6868)
at org.jocl.samples.JOCLRadixSort.radixSortBlocksKeysOnlyOCL(JOCLRadixSort.java:318)
at org.jocl.samples.JOCLRadixSort.radixSortStepKeysOnly(JOCLRadixSort.java:289)
at org.jocl.samples.JOCLRadixSort.sort(JOCLRadixSort.java:274)
at org.jocl.samples.JOCLRadixSort.doRadixSort(JOCLRadixSort.java:248)
at org.jocl.samples.JOCLRadixSort.(JOCLRadixSort.java:192)
at org.jocl.samples.JOCLRadixSort.main(JOCLRadixSort.java:620)
Exception in thread “main” java.lang.RuntimeException
at org.jocl.samples.JOCLRadixSort.(JOCLRadixSort.java:207)
at org.jocl.samples.JOCLRadixSort.main(JOCLRadixSort.java:620)

-> Did you compare the speed of this sort implementation to an “Arrays.sort”?

Java 1.6.0_16
java.util.Arrays.sort(intArray);
CPU: Intel Core 2 Quad 9550S (2,8 GHz)
chipset: Intel X48
RAM: 1333 MHz DDR3 CAS Latency (CL) 7
SiSoftware Sandra 2010 Pro Raw Performance (GFLOPS) = 37.70

Start sorting

time: 7.500814 ms
array size: 131072 Byte
array length: 32768 elements
validating…
values are sorted

time: 7.695027 ms
array size: 262144 Byte
array length: 65536 elements
validating…
values are sorted

time: 16.253822 ms
array size: 524288 Byte
array length: 131072 elements
validating…
values are sorted

time: 34.560425 ms
array size: 1048576 Byte
array length: 262144 elements
validating…
values are sorted

time: 72.745705 ms
array size: 2097152 Byte
array length: 524288 elements
validating…
values are sorted

time: 153.41054 ms
array size: 4194304 Byte
array length: 1048576 elements
validating…
values are sorted

time: 318.7354 ms
array size: 8388608 Byte
array length: 2097152 elements
validating…
values are sorted

time: 661.08856 ms
array size: 16777216 Byte
array length: 4194304 elements
validating…
values are sorted

End sorting

Java 1.6.0_16
CPU: NVIDIA GeForce 9400M G
RAM: 128 MB shared memory DDR2 800 CL5
raw performance (GFLOPS) over stream processors (GFlops) = 54

Start sorting

= = = workgroup size: 128 = = =

time: 10.569779 ms
array size: 131072 Byte
array length: 32768 elements
validating…
values are sorted

time: 14.977043 ms
array size: 262144 Byte
array length: 65536 elements
validating…
values are sorted

time: 23.03952 ms
array size: 524288 Byte
array length: 131072 elements
validating…
values are sorted

time: 39.957592 ms
array size: 1048576 Byte
array length: 262144 elements
validating…
values are sorted

time: 76.55637 ms
array size: 2097152 Byte
array length: 524288 elements
validating…
values are sorted

time: 138.79166 ms
array size: 4194304 Byte
array length: 1048576 elements
validating…
values are sorted

time: 273.41605 ms
array size: 8388608 Byte
array length: 2097152 elements
validating…
values are sorted

time: 540.7433 ms
array size: 16777216 Byte
array length: 4194304 elements
validating…
values are sorted

= = = workgroup size: 256 = = =

time: 16.51858 ms
array size: 262144 Byte
array length: 65536 elements
validating…
values are sorted

time: 24.967978 ms
array size: 524288 Byte
array length: 131072 elements
validating…
values are sorted

time: 45.04008 ms
array size: 1048576 Byte
array length: 262144 elements
validating…
values are sorted

time: 75.6733 ms
array size: 2097152 Byte
array length: 524288 elements
validating…
values are sorted

time: 145.25087 ms
array size: 4194304 Byte
array length: 1048576 elements
validating…
values are sorted

time: 274.57568 ms
array size: 8388608 Byte
array length: 2097152 elements
validating…
values are sorted

time: 541.9929 ms
array size: 16777216 Byte
array length: 4194304 elements
validating…
values are sorted

End sorting

-> Could this be added as an example on the jocl.org website?

Yes, you can add it as an example on the jocl.org website. I implement it as a proof of concept. There will be no warranty or maintenances from my side.