Which native function causes EXCEPTION_ACCESS_VIOLATION in JNI code?
Asked Answered
W

1

10

I'm trying to use the bullet physics library as wrapped by the libgdx Android Java development framework (gdx-bullet) and getting JVM crashes or "pure virtual method called" crashes after some short random period of work.

Some of them generate hs_err_pidXXXX.log files which generally contain:

#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x0aa0c217, pid=7956, tid=7440
#
# JRE version: 7.0_05-b05
# Java VM: Java HotSpot(TM) Client VM (23.1-b03 mixed mode, sharing windows-x86 )
# Problematic frame:
# C  [gdx-bullet.dll+0x1c217]

Current thread (0x04af2800):  JavaThread "LWJGL Application" [_thread_in_native, id=7440, stack(0x04d70000,0x04dc0000)]

siginfo: ExceptionCode=0xc0000005, reading address 0x6572fc0f

Registers:
EAX=0x0073f370, EBX=0x0073f480, ECX=0x0073f484, EDX=0x6572fc07
ESP=0x04dbf3c0, EBP=0x04dbf400, ESI=0x0073f120, EDI=0x04dbf3f0
EIP=0x0aa0c217, EFLAGS=0x00010206

Instructions: (pc=0x0aa0c217)
0x0aa0c217:   ff 52 08 f3 0f 10 05 0c f0 ba 0a f3 0f 10 4c 24

Register to memory mapping:
EDX=0x6572fc07 is an unknown value

Stack: [0x04d70000,0x04dc0000],  sp=0x04dbf3c0,  free space=316k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [gdx-bullet.dll+0x1c217]
C  0x38cffed8

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  com.badlogic.gdx.physics.bullet.gdxBulletJNI.btDiscreteDynamicsWorld_stepSimulation__SWIG_1(JLcom/badlogic/gdx/physics/bullet/btDiscreteDynamicsWorld;FI)I+0
j  com.badlogic.gdx.physics.bullet.btDiscreteDynamicsWorld.stepSimulation(FI)I+7

I was advised it is probably the issue of Java GC deallocating an object no longer referenced in Java code, but still needed by bullet native code.

I reviewed my code for those, but didn't find such situations, which doesn't mean they're not there. I could look for longer, but I think if I go forward with this approach I would need to learn how to debug such situations myself.

So I ran dumpbin.exe on gdx-bullet.dll and found the following:

6AB80000 image base (6AB80000 to 6BD4FFFF)

Then I added 0x6AB80000 + 0x1c217 = 0x6AB9C217 and looked that up in the dumpbin.exe disassembly:

6AB9C206: 8B 10              mov         edx,dword ptr [eax]
6AB9C208: 89 6C 24 0C        mov         dword ptr [esp+0Ch],ebp
6AB9C20C: 89 7C 24 08        mov         dword ptr [esp+8],edi
6AB9C210: 89 4C 24 04        mov         dword ptr [esp+4],ecx
6AB9C214: 89 04 24           mov         dword ptr [esp],eax
6AB9C217: FF 52 08           call        dword ptr [edx+8]
6AB9C21A: F3 0F 10 05 0C F0  movss       xmm0,dword ptr ds:[6AD3F00Ch]
          D3 6A
6AB9C222: F3 0F 10 4C 24 30  movss       xmm1,dword ptr [esp+30h]
6AB9C228: 80 7E 2C 00        cmp         byte ptr [esi+2Ch],0
6AB9C22C: F3 0F 5C C8        subss       xmm1,xmm0

Which is all nice but is about where I'm stuck as I don't know what's located at [edx+8].

I have the source code of bullet which was used (roughly this one).

I installed windbg.exe and managed to make userdump.exe generate a javaw.dmp file, but wasn't sure what to look for in one and how. I tried to find out using "r" command what's at rdx but it was 0x0 as opposed to the hs_err_pid file where it was some random value.

I found some build scripts but I somehow doubt I can add "include debug info" flags to them and then make them work in any timely fashion.

What can I do to figure out which particular native method is having the problem?

If I knew that then I could review its source code and understand what bad parameter I've passed to it or what object had been de-allocated by GC that it needs.

Wayworn answered 22/11, 2012 at 12:42 Comment(6)
did you take into account that RVA != VA? You will need a .map file for manual or proper PDB file for automatic lookup performed by WinDbg. I'd definitely suggest that first. Problem is that your build is probably still different, e.g. debug vs. release. Otherwise the last resort would be to learn RCE and use something like IDA or OllyDbg to map the disassembly to the source code. Quite a cumbersome method.Rimbaud
No, I last programmed assembly 15 years ago and that was real-mode, I can barely grasp the concept of "RVA != VA". I guess I'm out of luck.Wayworn
At this point [edx+8] is invalid but you might be able to get an idea of what's being called by looking at previous values of [edx+8] that are valid. You can do this by setting a breakpoint at gdx-bullet.dll+0x1c217 before reproducing the crash with bu gdx-bullet+0x1c217. This will break at the call instruction and then you can issue a t to step into the function being called. Now, if the call instruction is for calling everything under the sun then this technique won't help. But if it's for calling funcs that tell how far a bullet must travel then you're in luck.Glandulous
Marc, this is an interesting idea that I will try when the situation comes up next. Thanks!Wayworn
Not sure if this will help, but you can try to attach a debugger and have it set on a second chance exception for this error and do a live debug that way, that often gives you a much better idea of the state of the machine.Intelligent
If you have source code of your lib, then try debugging JNI and Java at the same time. Take a look here for the full sample: linkedin.com/pulse/…Frogfish
D
1

The pointer is probably too late, in your case, the access violation is happening because of access to a pointer from which memory has been de-allocated in the native code. In this case, if the memory pointed by this pointer is attempted to be written on, for a few times it may actually work, if the de-allocated block has not been re-allocated yet by memory management. I have encountered this in Windows, where eventually a Heap Overrun error is given by the program before exiting and from Java end the above error is thrown. The reason it works for a short period of time because once a memory block is de-allocated, it is not immediately allocated by Windows but when it needs to be re-allocated Windows checks the header field of the memory block and finds that it is corrupted and hence the Heap Overrun Error.

If the native code is written by you, you could add traces and find the problem, in a wrapped library perhaps you are not preserving the reference to the library at Java end or there is a problem in the wrapper.

Dodi answered 22/9, 2014 at 12:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.