About kernel vectorization

I’m currently developing an evolution/genetic algorithm and I get this info from intel OpenCL while compiling my kernels:
Kernel was not vectorized
I know its directly connected to the line rewriting data to the input buffer which is write enabled. If i disable that line i get:
Kernel was successfully vectorized
So what’s the meaning of this?

The problem with evolution algorithm is that i need to access data produced by other threads to create new specimen so I don’t see any chance of making isolated buffers for every thread.

Is vectorization needed for effective run on gpu or only on SSE cpu’s ?


A websearch for “Kernel * was not OR successfully vectorized” brings only few results, but it’s obviously specific for the Intel OpenCL implementation.

On the one hand, the message that the Kernel “…was successfully vectorized” sounds good, but what concerns me a little is this thread in the Intel forum which suggests that the auto-vectorization may sometimes cause wrong results.

Some more information might be found in threads like this one about vectorization, or in the document “Writing Optimal OpenCL™ Code with Intel® OpenCL SDK” on the Intel OpenCL site.

I’m sorry that I can not give more helpful information right now, since I have not yet used the Intel SDK (since I’m still on WinXP) and am not familiar with CPU-specific optimization options (since I’m mainly using a GPU). But from what I understood so far, the vectorization might bring a potential speedup, but how much speedup it might bring and whether the kernel really can be vectorized depends on the input data, the kernel code itself and the launch configuration (global work size)…