How to pass an array of multidimensional rows and two columns

I have received the mail (and the papers - I already found them while I tried to implement a basic levenshtein computations :wink: ). I’ll try to have a look at that in the next few days, but it’s really difficult to give focussed help based on the current questions.

The idea of splitting whole string into tokens is as in this post on stackoverflow (https://stackoverflow.com/questions/19884090/cuda-split-char-array). Please, have a look.

Also, from Java environment, using Jcuda, How to debug and detect the errors for the kernel. For example, When I run the program using neatbrans IDE, There is no output due to some error in the kernel, nevertheless the IDE does not explain anything about the kernel.

I had a look at the question. It is a bad question, because somebody just threw in some code and said „It doesn’t work“. I don’t know what to do with this (except, vote for deleting it).

I responded to your mail, with an example application (Java+Kernel) that computes the Levenshtein distances between two sets of strings, with CUDA.

Regarding the question about debugging: It’s difficult. (Maybe I already mentioned that. Maybe I should say it once more: Using CUDA is difficult!!!)

There is a section about debugging on the JCuda website: http://www.jcuda.de/debugging/Debugging.html

You can also try to hook into a running (Java) process using the NVIDIA CUDA debuggers, but I haven’t used this extensively, and cannot give any advice here.

Apart from that, you have to anticipate that you cannot just launch your application in Debug Mode to see what’s going on. You have to know what is supposed to be going on before writing the code!. Otherwise, you’ll be lost really, really quickly.

One approach that worked for me, for very simple cases, to some extent:

Write the function in Java (or maybe C), in a way of which you know that it can be translated to CUDA. For example, in the code that I sent you, I wrote a function for computing the levenshtein distance, as

private static void distance(
    byte s0[], int len0, 
    byte s1[], int len1,
    int t0[],
    int t1[],
    int result[])
{ ... }

There, you can easily test and debug the function, and translate it to CUDA when you know that it’s working. Of course, this does not work when you use something like shared memory. But it’s a start.

Thanks a lot for your help. I promise I will ask general questions.

Mr.Macro, I need a sample project for unified memory inside GPU.

@seham Does https://github.com/jcuda/jcuda-samples/blob/master/JCudaSamples/src/main/java/jcuda/driver/samples/JCudaDriverUnifiedMemory.java help?

The basic workflow is

  • the memory is allocated with CUDA, using cuMemAllocManaged
  • the pointer to this memory can can be converted into a ByteBuffer
  • the ByteBuffer can be used as normal (e.g. converted to a FloatBuffer or IntBuffer, if desired)
  • the buffer can be made accessible to the GPU with cuStreamAttachMemAsync

(I’d also have to look up some details in the documentation - e.g. cuStreamAttachMemAsync is asynchronous, so one has to be careful about that - but maybe it’s a start)


An aside: Marco is my first name. „Mr.“ is only used when addressing somebody by his last name. So I’m not „Mr. Marco“, but just „Marco“…

Dear Marco,

On your email, I have made a kernel and there is an error. Could you have a look, please?

Thanks Marco. You solve it.

Hi Marco,

If I need to transfer a single integer value from host to device not an array of integers

private static CUdeviceptr copyToDevice(int hostData)
{
    CUdeviceptr deviceData = new CUdeviceptr();
    cuMemAlloc(deviceData,  Sizeof.INT);
    cuMemcpyHtoD(deviceData, 
        Pointer.to(hostData),  Sizeof.INT);
    return deviceData;
}

the error at Pointer.to?Also, how to put it at the parameters kernel?

In order to pass a single value to a kernel, you also have pass it there as a pointer, which means that you have to wrap it into an array:

int value = 123;
Pointer pointer = Pointer.to(new int[] { value });

The code that you posted looked like you tried to copy the int to the device. In order to pass it to a kernel, you can use … https://github.com/jcuda/jcuda-samples/blob/master/JCudaSamples/src/main/java/jcuda/driver/samples/JCudaVectorAdd.java#L100

Hi Marco,

There is a problem when running the program. The details on email.

Hi Marco,

I have updated the NVIDIA GeForce GTX 860M to the latest update
I have updated CUDA version to a newer version CUDA 10.0
I have updated Jdk to Java JDK 15, Net beans IDE 12
I have updated to a compatible version of Jcuda version 10.0.0
Windows 8.1

The same error arises when I run the Jcuda program,

run:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ffbd6e3e1eb, pid=1912, tid=8356
#
# JRE version: Java(TM) SE Runtime Environment (15.0.1+9) (build 15.0.1+9-18)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (15.0.1+9-18, mixed mode, sharing, tiered, compressed oops, g1 gc, windows-amd64)
# Problematic frame:
# C  [nvcuda.dll+0x3ce1eb]
#
# No core dump will be written. Minidumps are not enabled by default on client versions of Windows
#
# An error report file with more information is saved as:
# D:\NetBeanProjects\OntologyJCudaProjectFinal\hs_err_pid1912.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
C:\Users\Computer Shop\AppData\Local\NetBeans\Cache\12.0\executor-snippets\run.xml:111: The following error occurred while executing this line:
C:\Users\Computer Shop\AppData\Local\NetBeans\Cache\12.0\executor-snippets\run.xml:94: Java returned: 1
BUILD FAILED (total time: 9 seconds)

What is wrong?

Your code.

Sorry, but this error happens essentially when anything is wrong with the kernel. You cannot derive what is wrong from this error message.

It only says EXCEPTION_ACCESS_VIOLATION, which indicates that at some point in the kernel, you have accessed an invalid memory location. This might, for example, be the case when you have an array with 100 elements, and access the element array[999]. It will crash.

If this is the same error that you reported via mail, then I already pointed you to jcuda.org - Debugging and how I used cuda-memcheck to find the invalid array access in your kernel. If this is a different error, then you might have to try out cuda-memcheck on your own.

Beyond that, I can only repeat what I already said earlier, particularly in this post: CUDA is difficult, you have to know what you are doing, and you can make your life much easier if you first try to write the kernel as a Java function that only uses primitive arrays as input and output.

Dear Marco,

I have promoted the version of Java from JDK 8 to JDK 15. When running the program, the following error arises.

ant -f D:\NetBeanProjects\OntologyJCudaProjectFinal -Dnb.internal.action.name=run.single -Djavac.includes=ontologyjcudaprojectfinal/OntologyJCudaProjectFinal.java -Drun.class=ontologyjcudaprojectfinal.OntologyJCudaProjectFinal run-single init: Deleting: D:\NetBeanProjects\OntologyJCudaProjectFinal\build\built-jar.properties deps-jar: Updating property file: D:\NetBeanProjects\OntologyJCudaProjectFinal\build\built-jar.properties Compiling 1 source file to D:\NetBeanProjects\OntologyJCudaProjectFinal\build\classes warning: [options] bootstrap class path not set in conjunction with -source 8 1 warning compile-single: run-single: BUILD SUCCESSFUL (total time: 3 seconds)

Knowing that I have set environment variables such as JAVA_HOME in user variables to C:\Program Files\Java\jdk-15.0.1

What is caused this problem? warning: [options] bootstrap class path not set in conjunction with -source 8

I haven’t used JDK 15 yet. But it seems like the error is indeed only caused by the missing JAVA_HOME. (I think this should be set during the installation of the JDK, maybe a reboot is required after that, but I’m not sure). It doesn’t appear to be a problem that is specific for JCuda, though…

Dear Marco,

I have studied cuda-memcheck, as described in http://www.jcuda.de/debugging/Debugging.html. When I run the program using cuda-memcheck with the command prompt, the following error occurs. I create a batch file with the following content.

java -cp .;jcuda.jar „D:\NetBeanProjects\OntologyJCudaProjectFinal\build\classes\ontologyjcudaprojectfinal\OntologyJCudaProjectFinal.class“

I try also, java -cp .;jcuda.jar „D:\NetBeanProjects\OntologyJCudaProjectFinal\src\ontologyjcudaprojectfinal\ontologyjcudaprojectfinal.java“

then in the command prompt, I make the following.
CUDA‐MEMCHECK „path of batch file“ it gives could not find or load main class as in the attached file in the email.

Which path I put in the patch file?

The message „Could not find or load main class“ indicates that the name of the main class that you have given was wrong (which was the case), and/or that the classpath had been incomplete (which also was the case).

I have sent an example .BAT file with some instructions via mail.

Dear Marco,

Happy new year. Thanks a lot, the first part of the kernel has been executed. Now, I am completing the remaining. I will sent on private after finishing. My question, if I need to put a special function inside the kernel that I will use inside global void function. This function is as follows.

#define MIN3(a, b, c) ((a) < (b) ? ((a) < (c) ? (a) : (c)) : ((b) < (c) ? (b) : (c))).

the kernel function,

extern "C"
__global__ void  ComputationgldOnGPU(char *str, char *patternRemoved,  int nx, int ny, int *dX) {
       Body of the kernel
}

Where will I put it?

Just into the CUDA source code. Before the extern C part, and into an own line, if that was the point:


#define MIN3(a, b, c) ((a) < (b) ? ((a) < (c) ? (a) : (c)) : ((b) < (c) ? (b) : (c)))

extern "C"
__global__ void exampleMin(int n, float *a, float *b, float *c, float *result)
{
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    if (i<n)
    {
        result[i] = MIN3(a[i], b[i], c[i]);
    }

}

Thanks Marco.