Inspecting files of type "NeXT/Apple typedstream" version 4 (NSArchiver)

Asked 16/9, 2013 at 17:57 Answered 13/3, 2015 at 17:48

ios objective-c macos nscoding nsarchiving

For a data recovery program I need to be able to extract the values+types from files written by NSArchiver, without having access to Apple's CF / NS frameworks.

The OS X file command reports such files as:

NeXT/Apple typedstream data, little endian, version 4, system 1000

Is there any documentation on how these files are encoded, or has anyone come up with code that can parse them?

Here's an example of such data (also: downloadable):

04 0B 73 74 72 65 61 6D 74 79 70 65 64 81 E8 03  ..streamtyped...
84 01 40 84 84 84 12 4E 53 41 74 74 72 69 62 75  [email protected]
74 65 64 53 74 72 69 6E 67 00 84 84 08 4E 53 4F  tedString....NSO
62 6A 65 63 74 00 85 92 84 84 84 08 4E 53 53 74  bject.......NSSt
72 69 6E 67 01 94 84 01 2B 06 46 65 73 6B 65 72  ring....+.Fesker
86 84 02 69 49 01 06 92 84 84 84 0C 4E 53 44 69  ...iI.......NSDi
63 74 69 6F 6E 61 72 79 00 94 84 01 69 01 92 84  ctionary....i...
96 96 1D 5F 5F 6B 49 4D 4D 65 73 73 61 67 65 50  ...__kIMMessageP
61 72 74 41 74 74 72 69 62 75 74 65 4E 61 6D 65  artAttributeName
86 92 84 84 84 08 4E 53 4E 75 6D 62 65 72 00 84  ......NSNumber..
84 07 4E 53 56 61 6C 75 65 00 94 84 01 2A 84 99  ..NSValue....*..
99 00 86 86 86                                   .....

This contains a NSAttributedString. I have similar examples that contain NSMutableAttributedStrings, etc., but all eventually resolve to NSAttributedStrings, for which I like to get the text. I do not care for the rest, but I need to know if it's valid.

My current solution is to use the NSUnarchiver and, assuming I always should find a NSAttributedString in there, get its first element and read its text, then recreate an archive from it and see if it is the same as the original data. If I get an exception or a different archive back, I assume that the archive is damaged or invalid:

NSData *data = [[NSData alloc] initWithBytesNoCopy:dataPtr length:dataLen freeWhenDone:false];
NSUnarchiver *a = NULL;

// The algorithm simply assumes that the data contains a NSAttributedString, retrieves it,
// and then recreates the NSArchived version from it in order to tell its size.
@try {
    a = [[NSUnarchiver alloc] initForReadingWithData:data];
    NSAttributedString *s = [a decodeObject];

    // re-encode the string item so we can tell its length
    NSData *d = [NSArchiver archivedDataWithRootObject:s];
    if ([d isEqualTo:[data subdataWithRange:NSMakeRange(0,d.length)]]) {
        lenOut = (int) d.length;
        okay = true; // -> lenOut is valid, though textOut might still fail, see @catch below
        textOut = [s.string cStringUsingEncoding:NSUTF8StringEncoding];
    } else {
        // oops, we don't get back what we had as input, so let's better not consider this valid
    }
} @catch (NSException *e) {
    // data is invalid
}

However, there are several issues with the above code:

It's not x-platform. I need this to work on Windows, too.
Some examples of damaged data cause an unwanted error msg written to stderr or syslog (not sure which), such as: *** mmap(size=18446744071608111104) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug (I filed a bug report about this which was closed as "won't fix", sadly).
Nothing guarantees that the NSUnarchiver code is 100% crashproof. The malloc error is an example for this. I might as well get a bus error in some situations, and that'd be fatal. If I had custom code for parsing, I can take care of that myself (and fix any crashes I encounter). (Update: I just found some invalid data that does indeed crash NSUnarchiver with a SIGSEGV.)

Therefore, I need custom code to decode these kinds of archives. I've looked at a few, but can't make sense of the codes it uses. Apparently, there are length fields and type fields, with the types being in the range around 0x81 to 0x86, apparently. Also, the first 16 byte are the header, including the system code (0x03E8 = 1000) at offset 14-15.

I also wonder if the source code is available in some old NeXT sources or in the Windows version that once existed, but where would I find that? (Note: I was directed to the GNUstep source ("core.20131003.tar.bz2"), in which I found its NSUnarchiver source, but that code, apparently from 1998, uses its own encoding, which isn't understanding this "streamtyped" encoding.)

Erkan answered 16/9, 2013 at 17:57 Comment(0)

While I don't know any documentation of the format, you may find the information you are looking for by checking the public source code from older Darwin (or maybe OpenStep) versions.

For example, have a look at the implementation of typedstream in the file typedstream.m in objc-1.tar.gz available at this mirror of an old darwin distribution.

This source code should be able to read/write typedstream. Just be sure to confirm to Apple's license when using it.

Mounts answered 3/10, 2013 at 19:44 Comment(4)

Yes! That's the first source I've seen that actually knows of "typedstream"! That should do it. – Erkan 3/10, 2013 at 20:8

Turns out that this code is too old and cannot handle the modern archives like the one provided in the question. – Erkan 20/3, 2015 at 13:57

Link provided at the top is dead. Can find one version here: next.68k.org/nextstuff/otto/html/pub/apple.com/PublicSource/… – Wesla 6/11, 2015 at 13:45

You can found it there: sourceforge.net/projects/aapl-darwin/files/Darwin-0.1 – Blackstone 13/1, 2019 at 10:23

Part of the issue here is that each class in Cocoa/NeXTSTEP/OPENSTEP knows how to archive itself. In each class there is an initWithCoder:/encodeWithCoder: method and inside there is a section for typedstream and another section for keyed archives. Keyed archives are more modern and are usually expressed as XML plists. These can be encoded in binary form, but, make no mistake, this binary form is NOT the same as a typedstream archive. Further they are keyed so that it's easy to pull out individual pieces of data without having to read all of the data which came before. Typedstream archives don't work this way. They are order based which means that each element in each object is written one after the other. First the class name, then the version, then each of the pieces of data. The reason GNUstep never implemented this is because the order of encoding is nearly impossible to discover.

When you archive the root object of an object graph it calls the encodeWithCoder: method on that object which in turn calls the encodeWithCoder: methods on each of the objects it contains and so on recursively until the entire object graph is archived. When this is done using keyed archives (NSKeyedArchiver) the archive is built and keyed appropriately. When it is done with a typed stream archive (NSArchiver) the same recursion happens but each time an object is encoded it just dumps each element out into the archive in whatever order the developer deemed appropriate at the time.

I hope this explanation clears things up a little. You have a hard road ahead of you. There were reasons doing this was avoided in GNUstep. If we had, we would STILL be trying to figure it out.

Earlap answered 4/10, 2013 at 5:2 Comment(2)

Thanks Greg it's a good summary (better than Apple's own). But despite the fact that the order and meaning of a class' stored values is unknown, each item is still individually packaged inside, each with an explicit type. And that's all I wanted to get out of this: A tree of types + values. I understand that this is not enough for generic retrieval of complex NS types, but it's good enough for my needs where I only want to recover text from such data. – Erkan 5/10, 2013 at 7:40

NB - You could probably figure out some types by simply creating your own NSArchiver and NSKeyedArchiver subclasses and using them to archive some common objects (NSString is probably a good start), to see what data is written at all, with which names. I'd guess that all archiving in the end boils down to primitive calls like -encodeInt: and -encodeBytes:. – Euonymus 13/3, 2015 at 17:56

Take a look at Cocotron's open source implementation of NSArchiver and NSUnarchiver:

https://code.google.com/p/cocotron/source/browse/Foundation/NSArchiver.m https://code.google.com/p/cocotron/source/browse/Foundation/NSUnarchiver.m

Galactometer answered 3/10, 2013 at 17:16 Comment(3)

Unfortunately, that one uses a different format, just like the GNUstep does :( – Erkan 3/10, 2013 at 17:43

Since these two classes have been deprecated, are you perhaps looking at NSKeyed*rchiver? – Enjambment 3/10, 2013 at 18:8

Nope, it's the old non-keyed version. In fact, when I give the data to the NSKeyedUnarchiver, it aborts with an error. – Erkan 3/10, 2013 at 19:52

It seems to be part of the GNU Objective-C runtime, even though it's not exactly runtime stuff (see the discussion at: http://gcc.gnu.org/ml/gcc-patches/2010-09/msg00495.html)

This file may implement that stuff: https://github.com/gnustep/libobjc/blob/master/archive.c

Santana answered 3/10, 2013 at 19:33 Comment(1)

Thanks, have to check it out tomorrow. I just wonder why someone downvoted your answer - if people do, they're supposed to leave a comment about it. – Erkan 3/10, 2013 at 19:54

First, please see Is there a way to read in files in TypedStream format for some interesting info.

Very probably, the format can be converted to something more readable using the plutil tool. This tool is also available for windows (it comes with iTunes for windows). Not sure about its license though.

The problematic part is the fact that the files contain object instances converted to binary. It's not enough to understand the file format, it's necessary to understand how every type is stored.

Goldenrod answered 3/10, 2013 at 19:47 Comment(1)

I had already read that other SO question and followed its links. Nothing in there that would help me in particular, it seemed. Also, plutil is not able to read these typedstream files. I've made a downloadable version of the file available now. – Erkan 3/10, 2013 at 20:1

Frank Illenberger wrote a NSUnarchiver replacement called MEUnarchiver based on the 1999's typedstream.m source code: https://github.com/depth42/MEUnarchiver

It has been extended to support newer types that are not known to the original source code. It still relies on the ObjC runtime to provide NSCoding decoder implementations for all the standard types such as NSString etc, but otherwise it is pretty self-contained and allows me to prevent crashes that occur with Apple's NSUnarchiver code when passing damaged data.

Erkan answered 13/3, 2015 at 17:48 Comment(1)

I think it's amusing that my initial sentence was not comprehended at all in this conversation. You can get the format correct, but you will need to figure out the order in which things are stored. NSArchiver is NOT keyed. It's simply... className data data data data in whatever order the author decided to stored the "data" in. Sometimes there is a version in there, but it is a stream of bytes. So MEUnarchiver likely got the archive format correct, but since every class has it's own way of storing itself it says nothing about the ORDER. – Earlap 3/3, 2020 at 17:50

Recommended topics

Hot tags