Java Wrappers for CUDA with JavaCPP

(Copied from http://forum.byte-welt.net/byte-welt-projekte-projects/swogl-jcuda-jocl/jcuda/16486-cudnn-wrapper.html#post118994 )

Hello,

I spend a couple of hours to wrap up cuDNN using JavaCPP. Since it references cuBLAS and the CUDA API itself, I also had to wrap those up together, but at runtime it shouldn’t matter if we use JCuda. I ported the mnistCUDNN.cpp sample to test it out, and it works just fine. It’s all available here:
https://github.com/bytedeco/javacpp-presets/tree/master/cuda

(I am targeting CUDA 6.5 because that’s what the current version of cuDNN uses, but I’ve made sure that the presets as is can generate wrappers for CUDA 7.0 as well.) To try them out, first get the latest source code for JavaCPP at https://github.com/bytedeco/javacpp and run mvn install to install that, before running mvn install on the presets. It should work on Linux, Mac OS X, and Windows the same, but I’ve only tested it on Linux x86-64 for now (and assuming everything can be found in /usr/local/cuda/).

I plan to wrap up other parts of CUDA while I’m at it, but the API is quite low level and a bit rough around the edges, so I believe there would be room for collaboration. Let me know what you think! Thanks

Samuel

Hello Samuel,

That’s interesting. I always wondered wheter one of the many tools for connecting Java and native libraries is actually capable of doing this for CUDA (but I hesitated to try all the tools that you list on your GitHub page :wink: Some of these tools are … well… as „rough around the edges“ as some of the native APIs are, and the resulting code often looks horrible)

Now, I wanted to give JavaCPP a try, curious how it feels, but obviously have to allocate a bit more time for that (and I had enough C/C++ build hassle at work recently, to be honest)

One thing that I was curious about was the handling of different memory types. The CUDA API contains several functions where you can’t detect from the signature whether the pointer points to host- or to device memory. (Sometimes it’s not even explicitly documented. And worse: Sometimes they change this between versions. And worst: Sometimes they change it, and don’t update the documentation…). How do you handle cases like this in JavaCPP?

bye
Marco

Yes, all other existing tools are horrible. IMO, there is nothing else than JavaCPP, for any language, on any platform, that is even remotely usable.

As for pointers, since CUDA uses void* for both types, it’s not like we can do much to differentiate them. They both get mapped to the same set of Pointer classes:
org.bytedeco.javacpp (JavaCPP 0.11 API)
Which basically provides the same functionality as JCuda’s own Pointer class. As long as we don’t try to dereference them, they’re basically just long integers. Sounds good?

Samuel

Yes, the Pointer classes in JavaCPP had been one of the first things that jumped into my eye. The “Pointer” class in JCuda was… well, something that I considered as the most simple and most straightforward approach 8 years ago, but today, I would also consider some (at least optionally) typed pointers. The nitty-gritty bits on native side, when one receives a Pointer to an array of Pointers (to possibly different types) can be a challenge. (More generally: The support of pointers to pointers to… pointers in arbitrary nesting depth).

I have not yet studied the code of JavaCPP in detail in this regard, but think that you always put the contents of Java arrays into (direct) Buffers, is this right?
When I started JCuda, I also wanted to support pointers that point to Java arrays directly, without copying the (possibly large) chunks back and forth. Basically, this is easy, with GetPrimitiveArrayCritical, but the details (nested pointers, asynchronous operations) can cause some headaches here.

I’ll try to have a closer look at what JavaCPP actually generates for cuDNN and CUDA in general.

(BTW: Calling “all other tools horrible” is a bit harsh: Some of them have limited application cases, or entirely different approaches, and one can’t expect everyone to take The Right Approach® right from the beginning ;))

Sorry, I was just picking up on a word you used. Of course I wasn’t talking about tools in general, or with limited use cases, but tools to access native C++ functionality from Java (or any other language that is not C++, for that matter). I haven’t found anything else that is actually usable. Have you? What kind of entirely different approach? To illustrate my point, I’ve spend a few more hours to wrap up cuFFT, cuRAND, cuSOLVER, cuSPARSE, and NPP: https://github.com/bytedeco/javacpp-presets/tree/master/cuda

Do you know of anything else that can help us do something that is even remotely comparable?

It prefers direct NIO buffers, obviously, but methods that accept Java arrays also get generated, without using GetPrimitiveArrayCritical() by default, but it would be possible to make it optional when the user knows it’s alright: https://github.com/bytedeco/javacpp/issues/16

Samuel

This word referred only to the resulting code, and not to the tools. There are many tools for accessing native libraries from Java, and I have not made a profound analysis of all their features and pros and cons. Some did stand the test of time until now, even when they are not cleanly designed, and are not able to cover all the “problematic” features of C++. I don’t know any other tool that is said to cover all these features. But many of these tools are considered to be particularly powerful or easy to use (and be it only by their authors ;)).

However, I occasionally already told people to have a look at JavaCPP, because it looks like it does not have some of the obvious weaknesses of other tools (although I have not yet used it myself). I know that my approaches (of a mix of manual coding and code generation) are certainly not the way to go in the long term, but still hope that this problem will sooner or later be tackled on a language/VM level, and not with code generators, however, that’s a different story.

Regarding the methods that accept Java arrays: That’s one of the aspects that my initial question was about. When a method expects device data, one can not pass a Java array. (No problem in general. Even in plain C you can pass in the wrong memory type. I was just wondering whether you differentiated them in any way)

Does JavaCPP also handle libraries with installable client drivers? (Like OpenCL?) BTW: The generator that is used in LWJGL ( https://github.com/LWJGL/lwjgl/tree/master/src/java/org/lwjgl/util/generator ) is not a standalone tool, but seems to have some similarities to JavaCPP from what I have seen so far (regarding the class structure and the use of annotations). But I may be wrong.

C++ is only defined (and not very well defined I might add) at the source code level, so unless people finally agree on a standard ABI and everything, it’s never going to be something that can be handled by a cross-platform VM. LLVM is great and all, but even that cannot interoperate in a satisfactory fashion with MSVC: MSVC compatibility — Clang 3.7 documentation So, hello code generators. :slight_smile: That’s the best we’ll ever get with C++ I’m afraid.

Actually, my method is pretty ad hoc too. There simply has been no work done on that in the past, which I find really strange. I had to start somewhere, to figure out what sort of works, and what sort of doesn’t. It’s been working out surprisingly well, despite the general ugliness of the parser and generator.

I’m not sure I understand how installable client drivers differ from normal libraries. In what kind of situation do you think we might get into trouble?

As for LWJGL, it only supports C, not C++: Direct3D 11 Binding Add support for C++ bindings · Issue #22 · LWJGL/lwjgl3 · GitHub (Let’s introduce them to JavaCPP. :slight_smile: C is pretty much standardized across platforms, and has a limited feature set, so that’s an easy case. We can make them tools super easy to use and very powerful, but in the end if they don’t work with C++, what’s the point? Why limit ourselves to C libraries? What’s the rationale? Do we really want to set those C vs C++ flame wars raging on again, but this time on the Java, CPython, etc platforms?

Samuel

Sure, the complexity of C++ with its compatibility issues between libraries (even libraries that are built on the same platform (and even libraries that are built with the same compiler (but different linker settings))) is something that baffled me recently (I (re)started C++ coding for the job, after 15 years of Java). All I can say that all this is (at least for me) a sign that somewhere something went horribly wrong. I’m not sure who to blame (the Standards Committees? The Compiler people? „The community“?, heck, I don’t care. I want to build some library and use it somewhere else…). But I can say for sure that this will not be „fixed“ soon. Efforts like LLVM (and the dozens of other IRs) at least show that even the C++ people begin to recognize that VMs and IRs have their advantages. Just like the Java people did 25 years ago :smiley:

So when I said that this should be tackled on the VM level, I meant really long term. There already are approaches that allow accessing platform-specific libraries (e.g. DLLs) using some magic method invocation tricks, without compiling native code at all, but I’m not sure how well they work in the corner cases…

I wouldn’t consider the use of annotations and your presets as „ad hoc“. The most obvious, straightforward approach would be to parse the headers and dump out the JNI method implementations. But the devil is in the „details“ (which in fact aren’t details at all). Namely, the type conversions. This is nicely covered with the @Cast annotation in JavaCPP, which solves quite a bunch of issues. This problem can be stated a bit more generally (or maybe, just a bit more fuzzy: ) One has to know the type mapping. In fact, that’s one of the main things that the code generator that I use internally is all about: The type mapping is configured with some rules. At the moment, this has to be done manually, and separately for the C- and Java-Part, which is a hassle and error prone. E.g. when a native method receives a cudaMemcpyKind, then I still have to manually add the mapping (pseudocode)

javaWriter.putMapping("cudaMemcpyKind", "int");
jniWriter.putMapping("cudaMemcpyKind", "jint");

:sick: Most of this information could be derived from the header files and some basic rules: When something is typedef’ed as an int, then in the Java world it’s usually an int/jint. There are some details that can not be derived (e.g. how to handle the case when an object is used by reference), but I think that some generalizations can apply here as well. The fact that for JavaCPP all this information is basically summarized at one place, in a machine-processable form in the presets (and the @Cast annotations) simplifies things a lot.

Concerning the parser (also in conjunction with LWJGL and C++ support in general) and the generator: I had a short look at the generator code of JavaCPP, but in fact, the Parser is far more interesting (and I hadn’t looked at this yet - just quickly scrolled over it now). As already mentioned, C++ is a beast - and so is the parser :wink: When I started „generalizing“ my code generation approaches, I did not even dare to tackle this manually. I considered using something like ANTLR and feed it with the C++ grammar, but back then, this was not trivial either. (Today, with ANTLR4, it might actually be a feasible approach). But I thought: Hey, people already did that (and these guys definitely know their stuff and the pitfalls of C++ better than me). So I just ripped the relevant libraries out of https://eclipse.org/cdt/ . It works pretty well, but is still inconvenient to work with (because of the plain complexity of C++). So I’m currently just using these libraries for parsing an AST out of the C/C++ header files, and translating this AST into a very simple Code model. It’s just powerful enough to handle C headers, and does not really support any C++ features, but at least I don’t have to do any manual parsing and still have the complete C++ AST available - so I could extract more information from the AST, if necessary. „Most“ libraries that one might to call from Java (at least, the ones I have been working with) offer a C interface, however: You can’t even pass a std::string from one DLL to another, anyhow (which is … odd, to say the least…)

Regarding the installable client drivers: I’m not sure whether or how this could influence JavaCPP - I haven’t looked at enough details here, and have to admit that I don’t have a profound background of all the technical details on the native side. But one of the issues I encountered was the binding of methods. In JOCL, the native methods themself are basically implemented as usual. But the actual OpenCL library methods are not called directly. Instead, they are called via function pointers that are obtained at runtime. For example, the clCreateContext function would usually be JNI’ed like this:

JNIEXPORT jobject JNICALL Java_org_jocl_CL_clCreateContextNative
  (JNIEnv *env, jclass UNUSED(cls), jobject properties, jint num_devices, jobjectArray devices, jobject pfn_notify, jobject user_data, jintArray errcode_ret)
{
    ...
    // Directly call the clCreateContext function:
    nativeContext = clCreateContext(nativeProperties, nativeNum_devices, nativeDevices, nativePfn_notify, nativeUser_data, &nativeErrcode_ret);
    ...
}    

But instead, a function pointer for this function is defined:

typedef CL_API_ENTRY cl_context 
    (CL_API_CALL *clCreateContextFunctionPointerType)(
        const cl_context_properties *, cl_uint, const cl_device_id *,
        void (CL_CALLBACK *)(const char *, const void *, size_t, void *),
        void *, cl_int *) CL_API_SUFFIX__VERSION_1_0;

and a global pointer to this function is stored

clCreateContextFunctionPointerType clCreateContextFP = NULL;

which is then dedicatedly initialized (in a platform dependent way - e.g using GetProcAddress(libraryHandle, name); on Windows) and used in the JNI implementation:

JNIEXPORT jobject JNICALL Java_org_jocl_CL_clCreateContextNative
  (JNIEnv *env, jclass UNUSED(cls), jobject properties, jint num_devices, jobjectArray devices, jobject pfn_notify, jobject user_data, jintArray errcode_ret)
{
    ...
    if (clCreateContextFP == NULL)
    {
         // throw...
    }
    ....
    nativeContext = (clCreateContextFP)(nativeProperties, nativeNum_devices, nativeDevices, nativePfn_notify, nativeUser_data, &nativeErrcode_ret);

The reason for these contortions is that, as far as I understood, you never know which functions are actually available in the „installed client“. Directly trying to link against these libraries (by refering to the „OpenCL.lib“) may end up with linker errors for undefined references (in this case, one of these function pointers would simply be null, which can be checked at runtime).

Maybe it’s not an issue for JavaCPP at all, I was just curious whether you had to handle this in any way.

There are some further details that might be interesting (e.g. asynchronous operations and their interdependencies with the garbage collector), but… that post is long enough for now :wink:

In the long term, I think everyone will simply agree on a single convention, just because there won’t be any reason to choose anything else than say GCC or LLVM, and they will tend to pick things that work, instead of reinventing the wheel, like what happened with Android using Linux, or the QWERTY keyboard. But I hope it goes a bit differently, and we move on to a VM like Java or .NET, and leave C/C++ only for kernel-level development or userland optimizations. One can always hope…

I did give NetBeans CND a go too, just because I find that NetBeans works better than Eclipse: c++ - parsing C ++ source code in java environment - Stack Overflow . It also uses ANTLR, but the problem with that is that we actually have to let it parse everything. Who cares about how something like std::vector is actually implemented? It gets real complicated real fast. It turned out to be easier to just pick manually only the bits and pieces we’re interested in. In any case, if you like this approach of parsing everything, be sure to checkout Clang. It’s probably going to be kept more up to date than either Eclipse or NetBeans. :slight_smile:

JavaCPP doesn’t use dynamic loading, no, because it’s not portable across platform: it’s not even C++. We end up using functionality from the native OS. Still, it would make sense – as an option – and is something that could be done easily enough. It’s just not on my list of priorities I guess.

Samuel

[QUOTE=saudet]In the long term, I think everyone will simply agree on a single convention, just because there won’t be any reason to choose anything else than say GCC or LLVM, and they will tend to pick things that work, instead of reinventing the wheel, like what happened with Android using Linux, or the QWERTY keyboard. But I hope it goes a bit differently, and we move on to a VM like Java or .NET, and leave C/C++ only for kernel-level development or userland optimizations. One can always hope…

In any case, if you like this approach of parsing everything, be sure to checkout Clang. It’s probably going to be kept more up to date than either Eclipse or NetBeans. :slight_smile:
[/quote]
When I first heard about LLVM, I quickly thought that this might be the next „big thing“ (and this was before they received the ACM award etc). But it’s indeed not only big regarding its impact in the field, but also technically: It’s not sooo easy to get started with it (particularly for some Windows/Java dummy like me). There are approaches for JLLVM and jclang, of course, but these vary in in maturity and development activity, and are in any case not something that can be used right out of the box. (Maybe JavaCPP can come for the rescue here? :slight_smile: )

[QUOTE=saudet;119470]I did give NetBeans CND a go too, just because I find that NetBeans works better than Eclipse: c++ - parsing C ++ source code in java environment - Stack Overflow . It also uses ANTLR, but the problem with that is that we actually have to let it parse everything. Who cares about how something like std::vector is actually implemented? It gets real complicated real fast. It turned out to be easier to just pick manually only the bits and pieces we’re interested in.
[/quote]

I see, that’s a valid point. One could try to counter this by saying that when you implement more and more C++ features in the Parser class, step by step, then it will likely become more complex and unmaintainable, and you’d be better off if you had used an established, engineered third-party parser right from the beginning. But of course, for the JNI generation, some things will probably never be relevant, so it’s hard to tell beforehand whether the manual parsing or the third-party library are the „„better““ choice here.

I’m not sure how problematic the third-party, full-AST-parsing approach is for headers that actually include things like the STL headers (which often quickly lead into a nasty, compiler-specific macro hell) - I simply did not yet try it out. In any case, the hope (there it is, again ;)) was that one can just have some black-box magic function like AST ast = Magic.parse("main.h", includePaths);. And at least for the (simple) cases that I tried, this blackbox actually works for me. The further processing of this AST is a different story. Just translating it into a simplified „Code Model“ for JNI generation for simple C headers works well, but I’m sure that there will be caveats when trying to do a more full-fledged analysis of C+±specific features (looots of instanceofs are lurking there…)

JavaCPP doesn’t use dynamic loading, no, because it’s not portable across platform: it’s not even C++. We end up using functionality from the native OS. Still, it would make sense – as an option – and is something that could be done easily enough. It’s just not on my list of priorities I guess.

Admittedly, I did not get the point of not being portable. Of course, one has to use platform-specific functions. But I’m not sure whether linking against such a library may not actually decrease portability, in the sense that the resulting DLL can not be used if the client does not have installed a perfectly matching implementation library. (But again, there are some technicalities that are beyond what I’m really familiar with).

Sorry for the late reply. Been busy finishing up a release: Now offering software under the Apache License

I still consider JavaCPP to be a „prototype“ because basically no one has ever tried to actually do anything about this particular problem, so I had to start somewhere. And I’m currently alone in this. Absolutely no one believes in what I’m doing – even though it works very well. For example, except for a few custom helper classes, and CUDPP, which isn’t part of CUDA per se and has largely been superseded by Thrust (which BTW works with JavaCPP, but not JCuda), I think I’ve pretty much covered all the functionality of JCuda in the latest release of JavaCPP. Could you point me to at least one gaping omission?

The goals of Eclipse CDT and NetBeans CND are to provide a C++ development platform, not to somehow bridge Java and C++. Similarly, the developers of LLVM basically only consider the case when we have some front end that spits out „bitcode“ and we want to end up with some machine code. In this context, how are we supposed to approach them with our ideas? We need to have some material, some sort of proof of concept that would justify any changes that we make to the design of their frameworks. That’s precisely what JavaCPP is all about. This is unlike your strategy of waiting until someone else tackles the problem. Who would that be? And why not try yourself?

About dynamic loading, the portability issues arise because we deal with different functions on different platforms. There is no convention about how dynamic loading should behave on any given platform. Luckily, they are almost never usable with C++, so we only have to worry about them in the case of some eccentric C APIs like OpenGL. :slight_smile:

Wasn’t it that before? However…

I still consider JavaCPP to be a „prototype“ because basically no one has ever tried to actually do anything about this particular problem, so I had to start somewhere. And I’m currently alone in this.

Which „particular problem“? The problem of connecting Java and C++ has been tackled in various ways since the beginning of JNI. Many of the roads taken have turned out to be dead ends, for various reasons. But as I already mentioned, JavaCPP seemed to be one of the more promising ones right from the beginning (for me, at least). It just came 7 years too late for JCuda :wink:

Absolutely no one believes in what I’m doing – even though it works very well. For example, except for a few custom helper classes, and CUDPP, which isn’t part of CUDA per se and has largely been superseded by Thrust (which BTW works with JavaCPP, but not JCuda), I think I’ve pretty much covered all the functionality of JCuda in the latest release of JavaCPP. Could you point me to at least one gaping omission?

No obvious one that I could think of right now. I do not have doubts that „good“ Auto-Generated bindings could replace JCuda in the medium term. When I started JCuda (actually, only JCublas in the beginning), I considered to use one of the above-mentioned Auto-Generators, but… I’m somewhat skeptic and hesitant, … again, for various reasons (and at least, I did not pick one of the „dead ends“, unless one considers „manually writing JNI code“ as such ;-))

The goals of Eclipse CDT and NetBeans CND are to provide a C++ development platform, not to somehow bridge Java and C++.

Nobody stated that. It’s just so that parsing C++ (and I mean not only C headers, but C++ in its deep, deeeep ugly entirety) is tremendously complex. And in CDT and CND, loads of people (who are hopefully smart and know their stuff) have put a tremendous amount of effort into creating clean ASTs. One should at least consider using this, instead of falling into the „Not Invented Here“ pattern. Of course, you already mentioned possible reasons to do manual parsing (basically: „Knowing where to stop“, referring to your ‚vector‘ example). But it’s certainly a trade-off, and one should probably try to keep the interfaces clean to be able to plug in another parser when it becomes necessary. (BTW: I still did not look at more details of the Code. It’s probably not a good sign. I should switch to part-time. All my projects are suffering from these >10hr-workdays…)

About dynamic loading, the portability issues arise because we deal with different functions on different platforms. There is no convention about how dynamic loading should behave on any given platform. Luckily, they are almost never usable with C++, so we only have to worry about them in the case of some eccentric C APIs like OpenGL. :slight_smile:

Does this still relate to the ICD, or are you referring to things like name mangling here? (I wonder who thought that this was a good idea. But… I mean, he really screwed things up, didn’t he?).

Since I posted here yes, it was. It’s just the first release with that license.

Which „particular problem“? The problem of connecting Java and C++ has been tackled in various ways since the beginning of JNI. Many of the roads taken have turned out to be dead ends, for various reasons. But as I already mentioned, JavaCPP seemed to be one of the more promising ones right from the beginning (for me, at least). It just came 7 years too late for JCuda :wink:

Sorry, had been wrangling with CORBA and SWIG mostly until I gave up, found JNA when that came out, and started working on a C++ version of that.

No obvious one that I could think of right now. I do not have doubts that „good“ Auto-Generated bindings could replace JCuda in the medium term. When I started JCuda (actually, only JCublas in the beginning), I considered to use one of the above-mentioned Auto-Generators, but… I’m somewhat skeptic and hesitant, … again, for various reasons (and at least, I did not pick one of the „dead ends“, unless one considers „manually writing JNI code“ as such ;-))

Nobody stated that. It’s just so that parsing C++ (and I mean not only C headers, but C++ in its deep, deeeep ugly entirety) is tremendously complex. And in CDT and CND, loads of people (who are hopefully smart and know their stuff) have put a tremendous amount of effort into creating clean ASTs. One should at least consider using this, instead of falling into the „Not Invented Here“ pattern. Of course, you already mentioned possible reasons to do manual parsing (basically: „Knowing where to stop“, referring to your ‚vector‘ example). But it’s certainly a trade-off, and one should probably try to keep the interfaces clean to be able to plug in another parser when it becomes necessary. (BTW: I still did not look at more details of the Code. It’s probably not a good sign. I should switch to part-time. All my projects are suffering from these >10hr-workdays…)

The „interface“ is basically the config file written in Java and the generated Java code from the header files, and I consider that to be pretty clean. We could replace it tomorrow with something better, that generates the same thing from the same config + header files.

Does this still relate to the ICD, or are you referring to things like name mangling here? (I wonder who thought that this was a good idea. But… I mean, he really screwed things up, didn’t he?).

Name mangling, well the whole concept of C++ classes disappearing is more problematic. That’s why people came with things like CORBA and COM, and more recently WinRT, but that’s not portable… yet. It would be interesting if Microsoft did bring WinRT to Linux via the .NET Core effort. Anyway, I see Java via a tool like JavaCPP being able to fill in that role in a portable manner. But like I said, you, or anyone else, do not believe that is possible, and I would like to know why.

Anyway, I guess you need some time to look at this, so when you do in the following days, weeks, or months, please send me a message by email, because I haven’t found a setting on your site to get email notifications to work properly. Thanks!