JVM quits when executing a JUCDA dotProduct

Hi,

I created a class which calculates a dot product on 2 vectors of doubles (to do that I modified JCublasSample.java). The class is called thousands of times (I iterate through the cells of a matrix. For each cell, I compute the dot Product of 2 associated vectors).

After a 1% or 2% of the run, the system seems to hang, then the Java program exits with a strange, random code („255“, „-1073740940“, …).
I checked by including a plain java dot product function in the class. When I execute this function instead
of the CUDA, it does not exit and continues as it should.

Note: I use Netbeans 7.1.

The bug report:

A fatal error has been detected by the Java Runtime Environment:

EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x0000000077b80895, pid=9032, tid=8372

JRE version: 6.0_26-b03

Java VM: Java HotSpot™ 64-Bit Server VM (20.1-b02 mixed mode windows-amd64 compressed oops)

Problematic frame:

C [ntdll.dll+0x50895]

If you would like to submit a bug report, please visit:

Bug Report

--------------- T H R E A D ---------------

Current thread (0x000000000078a000): JavaThread „main“ [_thread_in_Java, id=8372, stack(0x0000000002640000,0x0000000002740000)]

siginfo: ExceptionCode=0xc0000005, reading address 0xffffffffffffffff

Registers:
RAX=0x000000000273ef18, RBX=0x0000000000000002, RCX=0x000000000273efe8, RDX=0x000000000273ef58
RSP=0x000000000273ef08, RBP=0x000007fee68d7260, RSI=0x0000000000000020, RDI=0x0000000000000000
R8 =0x000000000272e000, R9 =0x0000000000000003, R10=0x0000000000000000, R11=0x00000000000018cd
R12=0x0000000000000000, R13=0x0000000000000050, R14=0x0000000000000003, R15=0x000000000078a000
RIP=0x0000000077b80895, EFLAGS=0x0000000000010202

Top of Stack: (sp=0x000000000273ef08)
0x000000000273ef08: 0000000000000202 0000000077b3b239
0x000000000273ef18: 000007fffffda000 0000000002776fba
0x000000000273ef28: 0000000002776f50 3137343100000000
0x000000000273ef38: 0000000000000000 000000000273f720
0x000000000273ef48: 0000000009d26150 0000000009d26100
0x000000000273ef58: 0000000002740000 000000000272e000
0x000000000273ef68: 2d45353900000053 00000000001ff840
0x000000000273ef78: 000000000273f750 00000000027a8cd0
0x000000000273ef88: 00000000027a8510 3535343100000000
0x000000000273ef98: 0000000000000003 000000000273f750
0x000000000273efa8: 000000000273f7c0 0000000002745ea3
0x000000000273efb8: 0000000002744ed0 0000000000000000
0x000000000273efc8: 000000000273f820 000000000273f7c0
0x000000000273efd8: 000000006dae6045 000000000273f750
0x000000000273efe8: 00000000027a8cd0 00000000027a8510
0x000000000273eff8: 3535343100000000 0000000000000003

Instructions: (pc=0x0000000077b80895)
0x0000000077b80875: d0 00 00 00 4c 89 a1 d8 00 00 00 4c 89 a9 e0 00
0x0000000077b80885: 00 00 4c 89 b1 e8 00 00 00 4c 89 b9 f0 00 00 00
0x0000000077b80895: 0f ae 81 00 01 00 00 0f 29 81 a0 01 00 00 0f 29
0x0000000077b808a5: 89 b0 01 00 00 0f 29 91 c0 01 00 00 0f 29 99 d0

Register to memory mapping:

RAX=0x000000000273ef18 is pointing into the stack for thread: 0x000000000078a000
RBX=0x0000000000000002 is an unknown value
RCX=0x000000000273efe8 is pointing into the stack for thread: 0x000000000078a000
RDX=0x000000000273ef58 is pointing into the stack for thread: 0x000000000078a000
RSP=0x000000000273ef08 is pointing into the stack for thread: 0x000000000078a000
RBP=0x000007fee68d7260 is an unknown value
RSI=0x0000000000000020 is an unknown value
RDI=0x0000000000000000 is an unknown value
R8 =0x000000000272e000 is pointing into the stack for thread: 0x000000000078a000
R9 =0x0000000000000003 is an unknown value
R10=0x0000000000000000 is an unknown value
R11=0x00000000000018cd is an unknown value
R12=0x0000000000000000 is an unknown value
R13=0x0000000000000050 is an unknown value
R14=0x0000000000000003 is an unknown value
R15=0x000000000078a000 is a thread

Stack: [0x0000000002640000,0x0000000002740000], sp=0x000000000273ef08, free space=1019k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [ntdll.dll+0x50895] RtlCaptureContext+0x85

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
v blob 0x000000000277707f
j java.util.AbstractCollection.toArray([Ljava/lang/Object;)[Ljava/lang/Object;+88
j java.util.regex.Pattern.split(Ljava/lang/CharSequence;I)[Ljava/lang/String;+244
j java.lang.String.split(Ljava/lang/String;I)[Ljava/lang/String;+6
j java.lang.String.split(Ljava/lang/String;)[Ljava/lang/String;+3
j cosine.Transformer.EdgeListToMatrix()Lcern/colt/matrix/impl/SparseDoubleMatrix2D;+38
j cosine.Main.main([Ljava/lang/String;)V+37
v ~StubRoutines::call_stub

--------------- P R O C E S S ---------------

Java Threads: ( => current thread )
0x000000000891c000 JavaThread „Low Memory Detector“ daemon [_thread_blocked, id=6916, stack(0x0000000009990000,0x0000000009a90000)]
0x000000000890f800 JavaThread „C2 CompilerThread1“ daemon [_thread_blocked, id=4672, stack(0x0000000009890000,0x0000000009990000)]
0x0000000008906800 JavaThread „C2 CompilerThread0“ daemon [_thread_blocked, id=5396, stack(0x0000000009790000,0x0000000009890000)]
0x0000000008902800 JavaThread „Attach Listener“ daemon [_thread_blocked, id=5376, stack(0x0000000009690000,0x0000000009790000)]
0x0000000008901800 JavaThread „Signal Dispatcher“ daemon [_thread_blocked, id=4872, stack(0x0000000009590000,0x0000000009690000)]
0x000000000887c800 JavaThread „Finalizer“ daemon [_thread_blocked, id=5884, stack(0x0000000009490000,0x0000000009590000)]
0x000000000887c000 JavaThread „Reference Handler“ daemon [_thread_blocked, id=5296, stack(0x0000000009390000,0x0000000009490000)]
=>0x000000000078a000 JavaThread „main“ [_thread_in_Java, id=8372, stack(0x0000000002640000,0x0000000002740000)]

Other Threads:
0x0000000008866800 VMThread [stack: 0x0000000009290000,0x0000000009390000] [id=5104]
0x000000000892f000 WatcherThread [stack: 0x0000000009a90000,0x0000000009b90000] [id=9136]

VM state:not at safepoint (normal execution)

VM Mutex/Monitor currently owned by a thread: None

Heap
PSYoungGen total 2389312K, used 40960K [0x0000000706000000, 0x00000007acaa0000, 0x0000000800000000)
eden space 2048000K, 2% used [0x0000000706000000,0x0000000708800010,0x0000000783000000)
from space 341312K, 0% used [0x0000000797d50000,0x0000000797d50000,0x00000007acaa0000)
to space 341312K, 0% used [0x0000000783000000,0x0000000783000000,0x0000000797d50000)
PSOldGen total 5461376K, used 0K [0x0000000512000000, 0x000000065f560000, 0x0000000706000000)
object space 5461376K, 0% used [0x0000000512000000,0x0000000512000000,0x000000065f560000)
PSPermGen total 21248K, used 3472K [0x000000050ce00000, 0x000000050e2c0000, 0x0000000512000000)
object space 21248K, 16% used [0x000000050ce00000,0x000000050d164108,0x000000050e2c0000)

Code Cache [0x0000000002740000, 0x00000000029b0000, 0x0000000005740000)
total_blobs=203 nmethods=28 adapters=129 free_code_cache=49888064 largest_free_block=12480

Dynamic libraries:
0x0000000000400000 - 0x000000000042e000 C:\Program Files\Java\jdk1.6.0_26\bin\java.exe
0x0000000077b30000 - 0x0000000077cd9000 C:\Windows\SYSTEM32
tdll.dll
0x0000000077450000 - 0x000000007756f000 C:\Windows\system32\kernel32.dll
0x000007fefd560000 - 0x000007fefd5cc000 C:\Windows\system32\KERNELBASE.dll
0x000007fefe6f0000 - 0x000007fefe7cb000 C:\Windows\system32\ADVAPI32.dll
0x000007fefec90000 - 0x000007fefed2f000 C:\Windows\system32\msvcrt.dll
0x000007fefec70000 - 0x000007fefec8f000 C:\Windows\SYSTEM32\sechost.dll
0x000007fefeb40000 - 0x000007fefec6d000 C:\Windows\system32\RPCRT4.dll
0x000007fefd340000 - 0x000007fefd397000 C:\Windows\system32\apphelp.dll
0x000007fef3670000 - 0x000007fef36c1000 C:\Windows\AppPatch\AppPatch64\AcGenral.DLL
0x000007fefd310000 - 0x000007fefd335000 C:\Windows\system32\SspiCli.dll
0x000007feff2a0000 - 0x000007feff311000 C:\Windows\system32\SHLWAPI.dll
0x000007feff230000 - 0x000007feff297000 C:\Windows\system32\GDI32.dll
0x0000000077780000 - 0x000000007787a000 C:\Windows\system32\USER32.dll
0x000007fefed30000 - 0x000007fefed3e000 C:\Windows\system32\LPK.dll
0x000007feff100000 - 0x000007feff1c9000 C:\Windows\system32\USP10.dll
0x000007fefeef0000 - 0x000007feff0f3000 C:\Windows\system32\ole32.dll
0x000007fefd840000 - 0x000007fefe5c8000 C:\Windows\system32\SHELL32.dll
0x0000000072d50000 - 0x0000000072d53000 C:\Windows\system32\sfc.dll
0x000007fef8be0000 - 0x000007fef8bf0000 C:\Windows\system32\sfc_os.DLL
0x000007fefc220000 - 0x000007fefc23e000 C:\Windows\system32\USERENV.dll
0x000007fefd470000 - 0x000007fefd47f000 C:\Windows\system32\profapi.dll
0x000007fefad30000 - 0x000007fefad48000 C:\Windows\system32\dwmapi.dll
0x000007fef7320000 - 0x000007fef7338000 C:\Windows\system32\MPR.dll
0x000007fee6710000 - 0x000007fee6b77000 C:\Windows\AppPatch\AppPatch64\AcXtrnal.DLL
0x000007fefeb00000 - 0x000007fefeb2e000 C:\Windows\system32\IMM32.DLL
0x000007fefed40000 - 0x000007fefee49000 C:\Windows\system32\MSCTF.dll
0x000000006d890000 - 0x000000006e048000 C:\Program Files\Java\jdk1.6.0_26\jre\bin\server\jvm.dll
0x000007fefabc0000 - 0x000007fefabfb000 C:\Windows\system32\WINMM.dll
0x000000006d800000 - 0x000000006d80e000 C:\Program Files\Java\jdk1.6.0_26\jre\bin\verify.dll
0x000000006d450000 - 0x000000006d477000 C:\Program Files\Java\jdk1.6.0_26\jre\bin\java.dll
0x0000000077cf0000 - 0x0000000077cf7000 C:\Windows\system32\PSAPI.DLL
0x000000006d850000 - 0x000000006d862000 C:\Program Files\Java\jdk1.6.0_26\jre\bin\zip.dll

VM Arguments:
jvm_args: -Dfile.encoding=UTF-8 -Dfile.encoding=UTF-8 -Xms8000m -Xmx12000m
java_command: cosine.Main
Launcher Type: SUN_STANDARD

Environment Variables:
JAVA_HOME=C:\Program Files\Java\jdk1.6.0_26
JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF-8
CLASSPATH=lasspath;C:\Program Files\Java\jdk1.6.0_26;D:\Docs Pro Clement\E-humanities\Java frameworks
ailgun-0.7.1;C:\Program Files\Java\jdk1.6.0_26\lib
ailgun-0.7.1.jar
PATH=C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.1\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.1\libnvvp;C:\Program Files\Common Files\Microsoft Shared\Windows Live;C:\Program Files (x86)\Common Files\Microsoft Shared\Windows Live;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Program Files (x86)\Windows Live\Shared;C:\Program Files (x86)\GTK2-Runtime\bin;C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn;C:\Program Files\Microsoft SQL Server\100\Tools\Binn;C:\Program Files\Microsoft SQL Server\100\DTS\Binn;C:\Python26;D:\Docs Pro Clement\E-humanities\E-training\Python\Files to exercise,D:\Docs Pro Clement\E-humanities\Java frameworks
ailgun-0.7.1,C:\Program Files\MongoDB\bin;C:\Program Files\Java\path\javaclass
ailgun-0.7.1;C:\Program Files\MongoDB\bin;C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\VSShell\Common7\IDE;C:\Program Files (x86)\Microsoft SQL Server\100\DTS\Binn;C:\Python27-32b\Lib;C:\Python27-64b\Scripts;C:\Python26\Lib\site-packages;D:\Docs Pro Clement\E-humanities\Java frameworks\Jcuda\JCuda-All-0.4.1-bin-windows-x86_64\JCuda-All-0.4.1-bin-windows-x86_64;C:\Program Files\Java\jdk1.6.0_26\bin;C:\Program Files\Java\path\javaclass;C:\Python27-32b;C:\Python26;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.1\lib;D:\Docs Pro Clement\E-humanities\Java frameworks\Jcuda\JCuda-All-0.4.1-bin-windows-x86_64\JCuda-All-0.4.1-bin-windows-x86_64
USERNAME=C. Levallois
OS=Windows_NT
PROCESSOR_IDENTIFIER=Intel64 Family 6 Model 42 Stepping 7, GenuineIntel

--------------- S Y S T E M ---------------

OS: Windows 7 Build 7601 Service Pack 1

CPU:total 8 (4 cores per cpu, 2 threads per core) family 6 model 42 stepping 7, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, ht

Memory: 4k page, physical 16754008k(12250680k free), swap 49827360k(36393256k free)

vm_info: Java HotSpot™ 64-Bit Server VM (20.1-b02) for windows-amd64 JRE (1.6.0_26-b03), built on May 4 2011 07:15:24 by „java_re“ with MS VC++ 8.0 (VS2005)

time: Tue Mar 06 23:28:56 2012
elapsed time: 0 seconds

Any clue?

Thx!

Clement

That’s me again. I investigated a bit more, and I could get an error msg from JCublas:

Exception in thread „pool-1-thread-59“ jcuda.CudaException: CUBLAS_STATUS_ALLOC_FAILED

Since this error appears after the 10,000th something launch of my dDot product, it seems clear that for some reason the previous launches did not de-allocate the memory on the GPU, and it ends up hitting the maximum capacity. But I don’t see why that’s the case? I de-allocate the space taken by my 2 vectors, and I shutdown Cublas:

    // Clean up
    JCublas.cublasFree(d_A);
    JCublas.cublasFree(d_B);

    JCublas.cublasShutdown();

So… what am I missing?

Thx!

Clement

Hello

The output looks strange, especially the part

C [ntdll.dll+0x50895] RtlCaptureContext+0x85

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
v blob 0x000000000277707f
j java.util.AbstractCollection.toArray([Ljava/lang/Object;)[Ljava/lang/Object;+88

It seems to crash in the ‚toArray‘ method, but this can hardly be true.

According to the message, you are (explicitly or implicitly, via an ExecutorService or so) using multiple threads. Maybe there is a problem with some concurrent access to resources, or there are many threads allocating memory in parallel (and thus causing the ALLOC_FAILED) but that’s hard to say by just seeing the error message…

In any case, an compileable example that may be used to reproduce the error may be helpful.

Can the snippet that you posted in the other thread ( http://forum.byte-welt.de/showthread.php?t=3855 ) be considered as such an example?

bye
Marco