Marco, you sent me an example of how events can be used. The question, I compare the host side against the device side. I use cuda events for the device part as you sent.
In c cuda programs, I use the following code to measure time on host for serial code:
Using QueryPerformanceFrequency, Is it available in Java? If not, Are there other ways to estimate serial function time on host using Jcuda? Thanks for great efforts.
long startNs = System.nanoTime();
sumMatrixOnHost(...);
long endNs = System.nanoTime();
long durationMs = (endNs - startNs) / 1e6;
System.out.println("That took " + durationMs + " millisecoonds");
But of course, measuring the execution time of a Java program in this way hardly makes sense: The Just-In-Time-Compiler will distort these results. There are some ways to alleveiate that problem. Usually, you should at least perform the task (sumMatrixOnHost here) multiple times, with different input sizes, make sure that the result of the computation is used (to prevent it from being optimized away), and compute the average computation time for multiple runs, and start with -verbose:gc to see whether the garbage collection might distort the results. It’s complicated.
So, how to accurately compare the two times, the code in serial and the same code in parallel.
All books I read related to cuda, given times by the way I sent you.
For Jcuda, how I estimate the time accurately?. I have finished the implementation of the kernels for a set of tokens (part of the ontology) and their serial counterparts. I will apply on whole ontology contains thousands or hundreds of thousands of elements. I need the difference in timing, please.
For the CUDA/JCuda part, you can use the events to compute the execution time of the kernel.
For the Host/Java part… I could point you to MicroBenchmarks - MicroBenchmarks - OpenJDK Wiki . You could spend time learning GitHub - openjdk/jmh: https://openjdk.java.net/projects/code-tools/jmh/ , and setting up a proper benchmark. You could read about the JIT and garbage collection. You could analyze your data, and generate different data sets for the comparison. You could do research about that topic, in order to make a performance claim that is profound and useful for others.
Or you could just use the function that I showed you. It’s probably „good enough“.