How does Mach-O loader loads different NSString objects?
Asked Answered
C

2

5

I have known that If you define a bunch of @"" NSString objects in the source code in Mac OS. These NSStrings will be stored in a segment in the Mach-O library.

Section
sectname __ustring
 segname __TEXT
    addr 0x000b3b54
    size 0x000001b7
  offset 731988
   align 2^1 (2)
  reloff 0
  nreloc 0
   flags 0x00000000
reserved1 0
reserved2 0

If I hex dump the binary, they are aligned closely one by one with a 0x0000as separator. What I want to know is how does the loader in Mac OS X load these NSStrings when the program runs? Are they loaded simpily by recognize the 0x0000 separator or these is a string offset table elsewhere in the binary pointing to separate NSString objects? Thanks.

(What I really want to do is the increase the length of one of the NSString, so I have to know how the loader recognize these separate objects)

added: I know if you define CStrings like @"abc" in the code it will goes to cstring segment. If it is a string like @"“”" with out of ascii characters it will goes to ustring section according to my digging.

Crystallo answered 22/5, 2010 at 15:58 Comment(0)
M
5

There is a cstring section with all the constant C strings. Each constant NSString just refers to one of those C strings. The C struct for a constant NSString looks like this:

struct NSConstantString {
  Class isa;
  char *bytes;
  int numBytes;
};

Look in the __DATA __cfstring section.

Edit:

The __ustring segment is the equivalent of the __cstring segment, except with UTF16 strings. So a constant NSString may refer to either ustring or cstring data.

The only reference to the ustring data is probably from the cfstring it is used by. If you lengthen one string, the cfstring referring to the next string will instead refer to the tail of lengthened string unless you fix it. You may be able to find some free space elsewhere that you can point the cfstring at.

Mediacy answered 22/5, 2010 at 16:27 Comment(4)
Any way to interpret the __cfstring?Crystallo
__cfstring is basically an array of NSConstantString. The isa value will be the same for every one. After that is the pointer to the character data followed by the length. I am not sure if the length for a ustring is bytes or characters.Mediacy
In my example, I can see the pointer to the character data is off by 0x1000, is it always the case? p.s. last thing I don't understand, big thumb up for you!Crystallo
In the otool dump, it is the difference between the addr (target memory address) and offset (from mach-o header in file) which will usually be 0x1000 for executables.Mediacy
P
2

No. Each string has an address in the binary. If you insert a character in one string, the address will increase of all the ones above it and you'll need to adjust their addresses wherever they are referred to in the binary, plus if you make the segment bigger, you'll possibly need to adjust the locations of any subsequent segments depending on how much packing there was for alignment of the segment. It's far easier to just recompile the program and let the linker take care of it.

NB NSStrings are not stored internally as sequences of C chars. It's an implementation detail, but I suspect that NSStrings use a 16 bit character width.

Perished answered 22/5, 2010 at 16:32 Comment(2)
Can you point to me where the string's addesses are located in the binary? It's a little hard to explain but current I do not have access to the source code, I am thinking about minus the length of the previous string I want to edit and re-point the address to my target string by +1 if I can find reference. Yes, string in the ustring segment are in 16bits width.Crystallo
Sorry, I don't know the answer to your question. drawnonward might be able to help you more.Perished

© 2022 - 2024 — McMap. All rights reserved.