Scala versions of JOCL examples

Hi,

I’ve started translating some examples to Scala as I want to develop something (eg entropy coders, primarily Huffman and ANS/FSE) using Scala + OpenCL duo.

Source codes attached.

I would be glad if you tried them and told how they work for you.

It’s a work-in-progress still, so some adjustments are to be made. I wanted to preserve the flow of OpenCL API calls but otherwise replace the Java logic with something usual to Scala. I’m in progress of doing that.

Next items for translation are the reduction and histogram examples. Then I’ll probably proceed to implementing Huffman coder (or explore some more algorithms for computing histograms).

Sorry that I did not yet respond here. I’m not really familiar with Scala (although there are already a bunch of Links to “Learning Scala” resources on my Desktop for quite a while now…). But I’ll try to test and run the examples ASAP.

(I have not yet looked at them, but… if you intend to create further examples, maybe they could be provided on the website as well (if you agree, and with credits going to you, of course)?)

I’ve attached a ready to use project with SBT files, Typesafe Activator and IntelliJ files. To quickly check you can run „./activator run“ or „activator.bat run“ in command line. I’ve tried to preserve original comments, but I’ve added „Translated by Piotr Tarsa“ at the top. Maybe I’ll add pointer to my GitHub account for better recognition (github.com/tarsa) later :slight_smile: Anyway, project should be relatively easy to run. One caveat is that Typesafe Activator will download probably hundreds of megabytes of JARs at first startup, so be patient.

Names of files, project, etc are irrelevant. Same for files that aren’t the translated Scala examples. So don’t pay attention to anything that is not a source code.

I just wanted to give it a quick try. The BAT file says that it could not find the command “findstr” (Win7/64), but I did not yet investigate this further.

However, I’ll try to test it at home in a real IDE that I already set up for my first Scala tests (although I’m not sure when I’ll have the chance to do this - there are some other tasks ATM, and concerning JOCL, I wanted to finish the update for OpenCL 2.0 first).

I also quickly looked over the Code, e.g. of the JOCLSample.scala, and it looks rather similar to the Java version. I’m not sure whether there is an easy way to make it more “idomatic” for real Scala users (like @Landei ). But this is a rather complex topic: Even for JOCL itself, one has to say that the the way how it is used is far away from usual Java. In Java, you would not expect a bunch of static methods. Instead, you’d expect the (somehow(!)) “object oriented” structure of OpenCL to be reflected in the API, with something like

Queue q = new Queue(someDevice);
q.add(new CopyTask(source, target));
q.add(new KernelTask(kernel));
q.finish();

etc. When I started JOCL, I thought that it should only be a 1:1 mapping of the API to serve as the most generic possible basis of such an Object-oriented wrapper. But properly designing such OO-wrappers require a great deal of familiarity with the patterns that are applied in heterogeneous computing. Back then, I thought that I would not be able to create a good solution, and today, I’m still not sure whether I could do it, but instead, am pretty sure that at least I would not have the time to design and maintain it appropriately. (I tried some approaches for that, but they are far from being publishable…)

BTW: You most likely know Olivier Chafiks ScalaCL ( https://github.com/ochafik/ScalaCL ). He’s been working on this quite a while now, and it looks definitely interesting - but I could not follow the whole development (including the rewrites ;)).

I just wanted to give it a quick try. The BAT file says that it could not find the command „findstr“ (Win7/64), but I did not yet investigate this further.

I’ve tried it on my Win8.1/64 and it works well. Commands „find“ and „findstr“ are available from command line. Maybe you’ve uninstalled some of the vital parts of your operating system?

I also quickly looked over the Code, e.g. of the JOCLSample.scala, and it looks rather similar to the Java version.

That was one of the objectives - to be similiar. But I’m only starting fiddling with Scala + OpenCL duo so maybe there will be more differences.

I like that JOCL keeps the API similar to the C API. That eases translation from C to Java/ Scala and potentially introduces less overhead (well, not counting passing of objects on JVM heap through JNI).

When I started JOCL, I thought that it should only be a 1:1 mapping of the API to serve as the most generic possible basis of such an Object-oriented wrapper. But properly designing such OO-wrappers require a great deal of familiarity with the patterns that are applied in heterogeneous computing. Back then, I thought that I would not be able to create a good solution, and today, I’m still not sure whether I could do it, but instead, am pretty sure that at least I would not have the time to design and maintain it appropriately. (I tried some approaches for that, but they are far from being publishable…)

There are different Java - OpenCL bindings and they have somewhat Java-ified API. They can be a source of inspiration, but overall it seems to me that there’s too small amount of programs written in Java and heavily utilizing OpenCL to give a feeling on how a convenient API should look like. For now, I would focus on performance and overhead instead of convenience and also on adding examples.

Scala is concise and powerful language and I’m thinking on employing Scala for actual generation of OpenCL code at runtime - not for examples, but for actual programs.

BTW: You most likely know Olivier Chafiks ScalaCL ( GitHub - nativelibs4java/ScalaCL: ScalaCL - run Scala on your GPU! ). He’s been working on this quite a while now, and it looks definitely interesting - but I could not follow the whole development (including the rewrites ).

Yep. I’ve looked at it few years ago (when I was writing my master thesis in 2011 I’ve planned to use OpenCL for BWT on GPGPU but failed and focused on BWT on multicore CPU) and it seemed too experimental. Unfortunately, it still seems so.

I think I will stick to my idea of generating OpenCL code as character strings instead of relying on some magic that converts Scala constructs to OpenCL constructs. It will be much more predictable and quicker to get good results.

(This „findstr“ message was on my office PC, it may be related to that)

Yes, I think that it’s interesting to have a look at the most „central“ class of OpenCL: The command queue. From the differences of this class in…
JOCL: http://www.jocl.org/doc/org/jocl/cl_command_queue.html
LWJGL: http://lwjgl.org/javadoc/org/lwjgl/opencl/CLCommandQueue.html
JavaCL: http://nativelibs4java.sourceforge.net/javacl/api/1.0.0-RC3/com/nativelibs4java/opencl/CLQueue.html
Jogamp-JOCL: http://jogamp.org/deployment/jogamp-next/javadoc/jocl/javadoc/com/jogamp/opencl/CLCommandQueue.html
one can easily see how much Java-ification the authors wanted to achieve :wink:

You’re right when you say that the 1:1 mapping of JOCL simplifies the translation between C and Java. But I think it also hinders a „broader“ application of OpenCL in Java. In the context of a larger, compute-intensive application, one should still identify the time-critical parts that should be run on the GPU, and clearly separate these parts from the rest of the application - roughly meaning that something like a cl_mem should not become a prominent, public representation of the data, but instead should be hidden as one implementation detail behind an interface.

Concerning ScalaCL: I think that Olivier has quite a bunch of projects running, and it’s hard to predict how actively he will work on a particular one (and how soon each of them will reach a „mature“ state).

However, the automatic generation of OpenCL code is an interesting topic. I think that the „transition“ here is smooth: Scala obviously offers some methods to easily inspect the AST, and create the OpenCL code directly. I tried something similar for Groovy quite a while ago ( GroovyGPU ). For Java, something similar is possible: Once I wanted to do a crazy experiment and implement „something-like-a-JVM“ in OpenCL. In the end, I noticed that I was essentially writing a disassembler that converted byte code to OpenCL code. And a really sophisticated and mature version of this is already available via Aparapi: GitHub - aparapi/aparapi: Official AMD Aparapi repository

In the meantime, I also considered some „intermediate“ library. Namely, a library that simply offers a bunch of kernels for certain elementary building blocks of parallel processing (scan, reduce, permute, and a bunch of vector operations similar to that in JCudaVec API Documentation ). However, I think that the benefit-to-effort ratio of such a library would be rather small. For example, all the vector operations could be covered with a bunch of code-generating utility methods that could be used like

UnOp unOp = Ops.createUnaryOp("sin"); // Creates a kernel that contains "y** = sin(x**)"
BinOp binOp = Ops.createBinaryOp("+"); // Creates a kernel that contains "z** = x** + y**"
Buffer result = binOp.apply(unOp.apply(buffer0), unOp.apply(buffer1));

and even this would hardly be justifiable, considering that something like

Op op = Ops.createOp("z = sin(x) + sin(y)");

would not be much harder (e.g. by using ANTLR in the background to parse the given string…)

However, a repository of good, generic OpenCL kernel implementations for scan/reduce/etc. would be a nice thing…

BTW: You mentioned that you have Win8.1, and you asked for OpenCL 2.0: Since I can’t test any OpenCL 2.0 support of JOCL here, would you mind doing some basic test of the new OpenCL features if I provided you a „preview“ version of the updated JOCL? (I’ll probably finish the first „release candidate“ today or early next week, but am hesitant to publish it without any test…)

Optimization for low overhead is much simpler task than inventing high level interfaces or frameworks. You don’t need prior experience to do that, while developing high-level libraries require knowledge of use cases and shortcomings of alternatives.

If there’s low-level binding for native code (like your JOCL is a binding to OpenCL) then it’s easy to build another library on top, which would be ordinary Java library without the need to compile C/ C++ code (as it’s already done within JOCL).

Building high level abstractions is generally tied to particular programming language. Something that’s optimal for Java developers don’t have to be optimal for Scala developers.

Aparapi is interesting project. It does some magic with Java bytecode to translate it to GPGPU instructions. While I think it’s somewhat unpredictable, it should smoothen the learning curve for GPGPU beginners, thus increasing usage of GPGPU among ordinary programmers. Also there’s OpenJDK: Project Sumatra - I’m curious how it will pan out. However, I would like to wait and see and in the meantime develop something useful that is a concrete product rather than framework.

Anyway, I’m writing this to post new verions of the samples translated to Scala. They are more or less done. HistogramAMD and HistogramNVIDIA aren’t cleaned up, but others are somewhat Scalaified. I think that’s enough to post them here and you can put them on front page of jocl.org if you decide to update the web page.

Fully agree here. In particular: The other way around is not possible. For every abstraction, one has to consider the posibility that someone would like to use a feature that has been „abstracted away“…

Aparapi is interesting project. It does some magic with Java bytecode to translate it to GPGPU instructions. While I think it’s somewhat unpredictable, it should smoothen the learning curve for GPGPU beginners, thus increasing usage of GPGPU among ordinary programmers. Also there’s OpenJDK: Project Sumatra - I’m curious how it will pan out. However, I would like to wait and see and in the meantime develop something useful that is a concrete product rather than framework.

Aparapi and Sumatra share some ideas (and the Developer of Aparapi is also involved in Sumatra). You mentioned that such a completely transparent translation of code may „smoothen the learning curve“ - but in the end, it basically removes the learning curve: Theoretically, you don’t have to know anything about OpenCL when you want to use Aparapi. For itself, it is a good thing, because it broadens the audience. But I also think that some background knowledge is at least helpful, but maybe even necessary on order to write programs that Aparapi can translate into efficient OpenCL programs (but admittedly, I’m not completely up to date concerning Aparapi).

Anyway, I’m writing this to post new verions of the samples translated to Scala. They are more or less done. HistogramAMD and HistogramNVIDIA aren’t cleaned up, but others are somewhat Scalaified. I think that’s enough to post them here and you can put them on front page of jocl.org if you decide to update the web page.

I finally could give them a try, and they seem to work smoothly (the „KernelArgs“ did not work, but this may be due to my PC, I’ll have to double-check this). I’ll try to upload them to the website soon. Thanks for this contribution!

I doesn’t seem to me that it removes the learning curve. You still need to understand how GPGPUs work in order to extract performance from them. Otherwise you’ll end up with mediocre performance and you’ll scratch your head wondering what you’ve done wrong.
„Hello World“ (ie summation of vectors of values for example) in Aparapi looks trivial compared to normal OpenCL sample. In Aparapi almost everything related to OpenCL is removed, you’re only left with notation of a kernel and its range. So it certainly allows to learn GPGPU programming with very small steps.

No problem.
KernelArgs require OpenCL 1.2 (as you’ve stated on your website anyway) so maybe your OpenCL driver don’t support that.

I finally uploaded the Scala samples that you provided. Thanks again!

Cool :cool:

However, I’m getting status 404 Not Found for every of that uploaded Scala samples.

Weird, it seems like I really just checked the links locally :o They are really uploaded now. Sorry about that.

Yep, it works now.