Of course, I know that there is a demand for that. Writing a simple matrix multiplication in OpenCL may be nearly trivial. Writing a fast matrix multiplication in OpenCL may be challenging. But writing optimized (and verified) implementations of all BLAS 1,2 and 3 kernels is certainly something that should be left to experts.
The AMD library is closed-source and ships as a standard C library. The only possiblity to call it from Java would be via JNI (or JNA or BridJ - I can only guess why no one is trying to do this...). Once I started creating bindings for the AMD BLAS and FFT libraries, but I simply don't have the time to properly maintain more of these libraries - although it would be nice to have such a library, since it could smoothly interoperate with JOCL.
Apart from that, there are open source libraries like MAGMA ( http://icl.cs.utk.edu/magma/ ) or ViennaCL ( http://sourceforge.net/projects/viennacl/ ) and maybe others. They have different goals and focusses and use different ... well... "strategies" to generate and call the OpenCL backends. These "strategies" are related to things like calling other C-libraries or using excessive C++ template magic. No plain OpenCL there.
So after all, I'm not aware of the existence of what we (and many other people) would like to have: A plain set of .CL files with BLAS kernels which can just be loaded and called. It's a pity.