When I execute on GPU in .cu file:
int a = threadIdx.x+blockDim.x*blockIdx.x; output[a]=cosf(a/N);
I have 3 time acceleration compared with CPU.
But if I use For loop in my .cu file:
int a = threadIdx.x+blockDim.x*blockIdx.x;
for(a=0;a<N;a++){output[a]=cosf(a/N); __syncthreads();}
I have 10 times slower on GPU compared with CPU.
This is example. My real task is more complicated, I can not do It without the For loop.
How I can quickly execute For loop on GPU?
To my understanding, the idea of “Data Parallel Processing” could (in a slightly oversimplified way) be summarized as “Avoiding for-loops”. The most simple approach is to replace a loop like
for (int i=0; i<N; i++) doSomething(i);
with
doSomething(threadIndex);
Maybe a little bit more context information for the example might be helpful. It does not become clear what the roles of ‘a’ and ‘N’ are (and, e.g. how large ‘N’ is). I also don’t see a reason to do the syncthreads() there. From the first glance, it looks like the roles of ‘a’ and ‘N’ are just swapped - this might indicate that the problem would have to be reformulated in order to gain most from the Data-Parallel approach…
Hm … I think I don’t get it: You are computing the thread index ‚eb‘, and ignoring it, but using ‚eb‘ as the variable in the for-loop instead - there’s no parallelism at at - except that all threads are executing this for-loop, which IS slow, of course
Can you explain what you want to compute? (Because I think what you are computing does not make sense in this form - but maybe I’m wrong…)
it looks like I solved my problem, I write function named LOOP that runs in parallel from function ADD, and in the function body I write what I wanted - the more difficult code:
I think the execution speed should not differ so much, if both code samples computed the same thing in the same way - but at least in the first version, the parallelism is destroyed by the wrong usage of the thread index and the outer for loop…
However, when it’s now doing what it should do, and it’s doing it fast, everything is fine