Calling JNI_CreateJavaVM crashes the program
Asked Answered
P

1

6

I have a C DLL which uses JNI to proxy any calls to an underlying java program which does the actual weightlifting. I am dynamically loading jrockit jvm.dll to make the function call.

Vendor A has a C# DLL which actually invokes my C DLL and another vendor B has a C# program which calls vendor A's C# DLL.

There wasn't any problems when testing with vendor A's C# DLL but upon integration with vendor B's C# program, my call to initialize the JVM via JNI_CreateJavaVM crashes the entire program.

Any help would be appreciated.

The error Messages I received were:

[ERROR] Could not find allocated thread local data key in TIB
[ERROR] Could not create fast TLD
JRockit aborted: Unspecified Error(52)
Assertion failed: Could not create fast tld 
In vmDebug Before Abort() (src/jvm/runtime/debug/debug.c:103)

EDIT 1: ok I have disassembled jvm.dll and it is calling TlsAlloc followed by TLSSet and to reach the code which shows the error message, the cmp esi, edx before je SHORT 04755D4B in the second image must not be equal.

The contents of call 04755DD0 in the first image is in the second image.

Does anyone know what the calculation before that(the 1 that manipulates esi and edx) does?

Disassembly 1 Disassembly 2

EDIT 2: (In response to P.T.) I did not set any specific threading system so I suppose that it is using the default threading system which is native as according to here: http://docs.oracle.com/cd/E13222_01/wls/docs81b/jrockit/threads.html

Your guess is most likely correct, upon looking at the disassembly, I found out that the the code logic goes something like this, it first calls TlsAlloc and then TlsSetValue to set the thread local storage at the index returned by TlsAlloc to a constant magic number of 4711 after which it loops using eip from the start of the thread information block looking for the value 4711, once it finds it, the code then calls TlsSetValue again to set the value to 1147 at which point it checks if eip is actually pointing to the thread local storage by ensuring that [eip] is set to 1147.

Vendor B is using C# for their programs, hence, they would be using CLR virtual machine. Once it reaches the point where vendor B calls my DLL, they would have already initialized WPF prism and mef framework, loaded all interface modules to their respective positions, initialized all singleton (Export in WPF prism terms) models and initialized the MS workflow. However, when I shift my initialization code to the first few lines, the jvm succeeded in its initialization (It isn't the correct place to initialize the jvm and we have not tested if the rest of the code works).

The code only branches to the error when TlsSetValue fails, is there any reason for TlsSetValue to fail? and what should I look out for in vendor B's code that might have caused the problem?

Pompom answered 4/1, 2013 at 3:37 Comment(8)
Holy dueling VM's, Batman! Write a web service! Or use a pipe. Or re-write your Java stuff in C. IMHO...Ratan
The reason for doing all of this is to maintain backwards compatibility with a previous version of the programPompom
JNI is instability, you shall pay attention with the dll formation, 32-bit or 64-bit? and your JVM, 32bit or 64bit?... a webservice or RIM would be better. We has a dll with JNI in our production, last year. it was fine on windowsServer-x86-32bit, but failed on IBM-64-bit machine. Then we rewrited the C-dll with Java, and solved that problem.Algebra
All code, C#, C or java is written for 32 bit environment, btw, anyone has nay idea what is TLD, TIB(google says it might be thread information block or type information block) or local data key?Pompom
Clearly it means Thread Information Block in this context. The trick with JNI is always to make sure it works without the Java part first. Somehow.Cay
@EJP do you have any idea what is a TLD? Do you know of a glossary on these kind of terms?Mephitic
Which threading system are you using in jrockit? (Native or thin?) The TIB and TLD acronyms are jrockit-specific, so there is unlikely to be a glossary anywhere. My guess (!) is that jrockit cannot setup thread-local data that it needs to identify its threads. Is Vendor B doing anything interesting with threads or VMs (like using a jrockit VM?)?Fons
@ P.T.: Please See EDIT 2 above in the main postPompom
K
3

I have encountered the same error, and I managed to figure out what is happening, at least in my case. It looks like a bug in JRockit, and your question was very helpful in investigating it.

The search that is carried out for the "magic number" placed in the slot extends over two pages of data starting at the beginning of the TEB. However, there are only 64 slots worth of storage located within the TEB itself. See http://msdn.microsoft.com/en-gb/library/windows/desktop/ms686708(v=vs.85).aspx.

If the storage slot that is allocated is of index 64 or higher, rather than putting the data in the embedded array, Windows puts it into the block pointed to by the TlsExpansionSlots pointer. Since this is outside the TEB, the search for the magic number fails and JRockit produces this error.

My instance of this also occurred within a .NET program. My speculation is that the CLR makes significant use of TLS, making it more likely for a high slot number to be allocated.

In my case, JRockit actually crashed while attempting to write the log line, possibly because it happens so early that the log has not yet been created. Not sure which version of JRockit you're using. Mine is:

C:\>java -version
java version "1.6.0_14"
Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
BEA JRockit(R) (build R27.6.5-32_o-121899-1.6.0_14-20091001-2107-windows-ia32, compiled mode)

I don't know if this is fixed in later revisions. If it isn't we (ie. my employer) will probably have to raise it with Oracle.

Katharina answered 14/1, 2013 at 18:52 Comment(4)
I tried allocating 64 tls slots before initializing the JVM and I managed to replicate the error on my side, looks like it is a bug with the JVM like you said.Pompom
I tried wit the latest jrockit-jre1.6.0_37-R28.2.5 that I can find on the oracle website, the problem still persistsPompom
Thanks for checking. I'm going to get a ticket raised with Oracle.Katharina
After several months of back-and-forth with Oracle, they decided this "isn't supported". Doesn't look like there will be a fix.Katharina

© 2022 - 2024 — McMap. All rights reserved.