Mach-O symbol stubs (IOS)
Asked Answered
W

1

11

I am trying to understand how Mach-o files work, and have made a good deal of progress with the online resources available (In particular, the Apple page here: http://developer.apple.com/library/mac/#documentation/developertools/conceptual/MachORuntime/Reference/reference.html), but I have hit a roadblock on understanding how symbol stubs work.

Using "otool -l" I see the following section:

Section
  sectname __symbolstub1
   segname __TEXT
      addr 0x00005fc0
      size 0x00000040
    offset 20416
     align 2^2 (4)
    reloff 0
    nreloc 0
     flags 0x80000408

However when I look at the data from the binary file in a hex editor I see the following 4 bytes repeated again and again:

00005FC0  38 F0 9F E5 38 F0 9F E5  38 F0 9F E5 38 F0 9F E5  88
00005FD0  38 F0 9F E5 38 F0 9F E5  38 F0 9F E5 38 F0 9F E5  88
00005FE0  38 F0 9F E5 38 F0 9F E5  38 F0 9F E5 38 F0 9F E5  88  
00005FF0  38 F0 9F E5 38 F0 9F E5  38 F0 9F E5 38 F0 9F E5  88

This looks something like a LDR which increases the PC by a fixed amount, but I don't see why the amount is the same for each entry in the symbol table.

If someone can shed light on why this is so, or provide any resources that get this low level, please let me know.

Thanks!

Wop answered 11/1, 2012 at 19:37 Comment(1)
I think I figured out what is going on, each of the 4-byte entries in the table is pointing to a data region below which has a DCD instruction which stores the location of the function call. However I'm still a bit confused why this extra level of indirection is required, so if anyone can provide follow info I'd appreciate it.Wop
L
21

I will describe the situation with the current iOS, it's somewhat different in the old versions.

The symbol stubs indeed load into the PC a function pointer. For the standard "lazy" (on-demand) imports, the pointer resides in the __lazy_symbol section and initially points to a helper routine in the __stub_helper section, e.g.:

__symbolstub1 _AudioServicesAddSystemSoundCompletion
__symbolstub1 LDR  PC, _AudioServicesAddSystemSoundCompletion$lazy_ptr
__symbolstub1 ; End of function _AudioServicesAddSystemSoundCompletion

__lazy_symbol _AudioServicesAddSystemSoundCompletion$lazy_ptr DCD _AudioServicesAddSystemSoundCompletion$stubHelper

__stub_helper _AudioServicesAddSystemSoundCompletion$stubHelper
__stub_helper LDR R12, =nnn ; symbol info offset in the lazy bind table
__stub_helper B   dyld_stub_binding_helper

The function dyld_stub_binding_helper is the fist one in the __stub_helper section and essentially is just a trampoline to the dyld_stub_binder function in dyld, passing to it what I call "symbol info offset" value. That value is an offset inside the lazy binding info stream (pointed to by the LC_DYLD_INFO or LC_DYLD_INFO_ONLY load command), which is a sort of bytecode stream with commands for dyld. Typical sequence for a lazy import looks like this:

72: BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB(M, 0xYYYYY)
19: BIND_OPCODE_SET_DYLIB_ORDINAL_IMM(NNNN)
40: BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM(0x00, '_AudioServicesAddSystemSoundCompletion')
90: BIND_OPCODE_DO_BIND()

here dyld would do the following:

  1. look up function named '_AudioServicesAddSystemSoundCompletion' from a dylib number NNNN in the list of dylibs listed in the load commands.
  2. look up the executable's segment number M (most likely __DATA)
  3. write the function pointer at the offset YYYYY.
  4. jump to the looked up address so that the actual function does its job

The address written to happens to be the _AudioServicesAddSystemSoundCompletion$lazy_ptr slot. So, the next time the _AudioServicesAddSystemSoundCompletion is called, it will jump directly to the imported function, without going via dyld.

N.B.: you should not look at the offset 05fc0 in the file right away. The addr field is the virtual address, you should look up the containing segment command and see at what VA it starts and what is its file offset, then do the math. Usually the __TEXT segment starts at 1000.

However, the actual symbol stubs do look like you pasted, probably you have a fat mach-o with the fat header taking the first 1000 bytes, so the offsets line up.

Lithology answered 12/1, 2012 at 14:20 Comment(13)
+1. Slow clap, standing ovation. Fantastic answer. I learnt a lot from that!Kindless
Wow, amazing answer! I have to read it through a few times before I can fully understand, but Igor please tell me one thing. Where did you get this knowledge? By reading all the relevant ASM or is this documented fully anywhere? Thanks!Wop
Mostly from the binaries, yes. However, in theory you can figure it out from the source code too - see Apple's sources for dyld, ld64 and Csu.Lithology
Igor, I see that (as you mentioned) the lazy symbol section points to helper routines, but the documentation plus otool's output implies that __lazy_symbol is somehow pointing to the indirect symbol table, indexed by the reserved2 field (which is 16 in my case). I looked at the data in the indirect symbol table + 16 but I just see values like 0x0000001a, 0x0000001e, etc., and I'm not sure what these refer to. Could you please explain? Thanks!Wop
AFAIK the indirect symbol table is not used by the current dyld; it's just there for compatibility with old tools. You should look at the __lazy_symbol section itself, not the symbol table. And don't trust the documentation too much, it's unfortunately has not been updated to reflect the latest changes in the format. Better look at the sources I mentioned and mach-o/loader.h header (but be careful, a lot of things in it are outdated too).Lithology
Igor, not sure if you are still reading this post but I had another question. How can I figure out the method names for dynamically linked function calls in a IOS binary? They are displayed in IDA but I don't see the strings themselves inside the main binary. I guess there is some lookup process but I can't figure out how to get the string name from the __stub_helper code.Wop
The names are encoded in a trie (prefix tree) structure, see here for a good description.Lithology
Thanks again Igor, I meant import not export (sorry if I wasn't clear) but that page you provided was helpful. I was able to find the function names at LC_DYLD_INFO_ONLY->lazy_bind_off. I figured out that I can walk through all the strings to get the one that matches up with the __symbolstub1 entries (since they are in the same order), but is there a more direct way to get a pointer to the exact string name's location so I don't have to do the tedious walking process?Wop
The value loaded into R12 before jumping to the helper is the offset in the opcode stream for the corresponding binding sequence.Lithology
LC_DYSYMTAB is not effectively used by dyld, but is still required to exist for Mach-O loading checks. So it contain arbitrary data and the executable would still work.Cupcake
Is there any more documentation on when the shift from dysymtab+symtab->dyld_info happened? The only other place I can see this mentioned is in pypi.org/project/machomachomangler where they say:Stereophonic
" new-style DYLD_INFO symbol table (introduced in 10.5), not the (almost?) totally redundant SYMTAB/DYSYMTAB symbol table. (Interesting fact: all Mach-O binaries include two completely different representations of their symbols tables. The new one is more compact, to save space, but then they keep the old one around for compatibility, so… anyway.) As far as I can tell, the only thing in in modern MacOS that still uses SYMTAB/DYSYMTAB is dladdr, and I don’t think anyone is relying on dladdr output for, well… anything?"Stereophonic
Maybe some more info in newosxbook.com/articles/DYLD.htmlStereophonic

© 2022 - 2024 — McMap. All rights reserved.