Hi.

I trying to create an example to calculate and average value. The kernel with calulate an result and added to total sum.

Im using latest 2.0 openCl with JOCL.

```
__kernel void averageKernel(__global const int *in, __global int *sum)
{
int gid = get_global_id(0);
if(get_local_id(0)==0){
sum[0]=0;
}
float res = sin((float)gid) + cos((float)gid) + sin((float)gid) + in[gid];
atomic_add(&sum[0],res);
}
```

Kernel works with integers but not with doubles. Then I get the error:

**instance of overloaded function “atomic_add” matches the argument list argument types are: (__global double *, float)

**

Thanks

//Fredrik

*** Edit ***

Was reading the API, https://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/atomic_add.html

Seems like atomic_add() only supports integers, I dont know why.

Any suggestions for a workaround so floats or doubles can be used?

I haven’t yet used atomics extensively. A quick websearch reveals some options ( Advanced application development: OpenCL 1.1: Atomic operations on floating point values ), but for `double`

values, this will likely be more tricky. The kernel that you posted so far seems to be a “dummy” kernel (at least, I’m not sure what the sin/cos computations should accomplish). If you intend to compute the average of a given array, I think a reduction with “+” and a subsequent division by the length of the array may be an option…

I found an example that works with floats.

```
void atomicAdd(volatile __global float *addr, float val)
{
union{
unsigned int u64;
float f64;
} next, expected, current;
current.f64 = *addr;
do{
expected.f64 = current.f64;
next.f64 = expected.f64 + val;
current.u64 = atomic_cmpxchg( (volatile __global unsigned int *)addr,
expected.u64, next.u64);
} while( current.u64 != expected.u64);
}
```

but should be used with care since it has an big performance impact.

//Fredrik

This looks like it was equivalent to the sample in the article that I linked to (although I wonder why it’s “64” in the variable names - a float value only has 32 bits…).

How large the performance impact actually is? Sorry, no idea, but … the built-in atomics (for int) won’t come for free either…

Yes, almost the same example, only 64 bits part is different.

For me the differens was multiple times.

I tested to replace the 64 with 32 and there was an differens by one decimal. Not much but there should not be any ?

Thanks

//Fredrik

Well, I would have to “align” (format) the code a bit better, but from now having looked twice over both, I’d say that they are equal - but regardless of that: The “64” parts are only **variable names** and technically don’t make a difference. Where did you see the “once decimal” difference that you mentioned?

Hi.

I post the result from the program when got home again in the weekend.

Thanks

//Fredrik

Hi.

Just arrived home. Run an exempel and when using the 32 vs 64 named version I get the result:

Average value calculated: (2.6229537/500) = 0.005245907 (32 named version)

Average value calculated: (2.6229506/500) = 0.005245901 (64 named version)

But it should not be you said, I need to do some more investigation

Thanks

//Fredrik

When the **naming of variables** seems to make a difference, it’s more likely that there is some other sort of race condition involved. Did you try each version serveral times? When **one** version produces different results in each run, then there is another problem.

(Maybe I can try it out as well in the next days)

Hi.

You are correct, seems to be som sort if race cond. since im getting different result each time.

//Fredrik

Note, however, that a race condition may not be the **only** reason of why you receive different results each time. The order in which the threads are run (i.e. “the order in which the `gid`

s are processed”) is not necessarily deterministic. And since you are adding floating point values, the order will affect the results. Floating point arithmetic is not associative:

```
public class FloatingPointArithmeticIsNotAssociative
{
public static void main(String[] args)
{
double x=0.1+(0.2+0.3);
double y=(0.1+0.2)+0.3;
System.out.printf("%30.20f
", x);
System.out.printf("%30.20f
", y);
System.out.println("equal? "+(x==y));
}
}
```

I think that even numerical tricks like https://en.wikipedia.org/wiki/Kahan_summation_algorithm won’t help here, as long as the order of additions is not predictable.