Hi new poster on the forum, and have just started playing with JOCL. Having read quite a few samples, and lots out there for parallel summing arrays and using algorithms such as warpReduce to best make use of GPU to sum - I’ll get there and playing with that too, but to help get going I’m trying to implement something haven’t seen samples for yet. I have a vectors of arrays which I want to sum and reduce down to a single array. Seems most recommended advice is to convert to a one dimensional array and then sum treating as a single array, which is what I have done. Seems to work for small test arrays, but as I up the number of arrays to sum (keeping x elements the same, e.g. increase na to a higher number in sample) it fails. With the attached at 50, it sometimes succeeds and sometimes fails, which I assume is because the kernel hasn’t finished before reading results, although I do have a cl_finish in there. I’m sure it’s something simple, any help appreciated!