Hi all.
The document for clfinish is
**clFinish does not return until all queued commands in command_queue have been processed and completed. clFinish is also a synchronization point. **
Please see the following code, for 125’000’000 calculations it takes the kernel** 582ms** to execute with clFinish()
When** i dont** use the clFinish() the kernel execute in 14ms and when comparing the result from both runs its still the same
Do I use the clFinish() wrong?
long startTime = System.currentTimeMillis();
// Execute the kernel
clEnqueueNDRangeKernel(commandQueue, kernel, 1, null, global_work_size, local_work_size, 0, null, null);
clFinish(commandQueue);
long endTime = System.currentTimeMillis();
System.out.println("Time to execute kernel " + n + " calculations; " + (endTime-startTime) + " ms");
long startReadBuffer = System.currentTimeMillis();
// Read the memory on device and store it into host mem dst variable
clEnqueueReadBuffer(commandQueue, memResult, CL_TRUE, 0, n * Sizeof.cl_double, dst, 0, null, null);
long endReadBuffer = System.currentTimeMillis();
System.out.println("Read data from device; " + (endReadBuffer-startReadBuffer) + " ms");
Result with clFinish:
Allocated global memory for input data: 500.0 MB
Allocated global memory for result data: 1000.0 MB
GPU init + buffer create was 1562 ms
Time to execute kernel 125000000 calculations; 618 ms
Read data from device; 489 ms
Data in pos: [0,0,0]: 2.381773290676036
Data in pos: [9,0,0]: 1.7791850202025574
Data in pos: [0,9,0]: 67.25440184324135
Data in pos: [0,0,1]: 500.1776124768833
Data in pos: [9,1,9]: 1499.4575887415901
Data in pos: [214,127,210]: 7251.412959082049
Result without clFinish()
Allocated global memory for input data: 500.0 MB
Allocated global memory for result data: 1000.0 MB
GPU init + buffer create was 1549 ms
Time to execute kernel 125000000 calculations; 14 ms
Read data from device; 1019 ms
Data in pos: [0,0,0]: 2.381773290676036
Data in pos: [9,0,0]: 1.7791850202025574
Data in pos: [0,9,0]: 67.25440184324135
Data in pos: [0,0,1]: 500.1776124768833
Data in pos: [9,1,9]: 1499.4575887415901
Data in pos: [214,127,210]: 7251.412959082049
Thanks
//Fredrik
*** Edit ***
Hi.
I just found out
If not using the clFinish() the clEnqueueReadBuffer() will be added this time. Since clEnqueueReadBuffer() is waiting until kernel is complete
Correct?
//Fredrik