Decoding a file compressed with an obsolete language
Asked Answered
C

1

6

I'm trying to decompress a data file that was originally compressed with an extension for AMOS Pro, the old Amiga BASIC language, that shipped with the AMOS Pro compiler. I've still got the programming language and have access to the compressor and decompressor, but I'm trying to decompress the files using C. I ultimately want to be able to view these files on modern hardware without having to resort to using an Amiga emulator first.

However, there's no documentation as to how the compressor worked, so I'm trying to reverse-engineer it solely from watching its behaviour. Here's what I've got so far.

This is a raw file (ASCII):

AABCDEFGHIJKLMNOPQRSTUVWXYZAABCDEFGHIJKLMNOPQRSTUVWXYZAABCDEFGHIJKLMNOPQRSTUVWXYZ

Here's the compressed version (hex):

D802C6B5
05048584
4544C5C4
2524A5A4
6564E5E4
15149594
5554D5D4
3534B591
00000007
AD763363
00000051

Testing with various files has given me to a few insights:

  • The last 4 bytes are the size of the original file.
  • The file seems to function as a bit stream, so byte boundaries aren't important (I say this because I've seen ASCII codes appear in a few files and they aren't aligned to byte boundaries).
  • All of the bits in the file are stored in reverse.

The first 4 byte seems to represent a sequence length. In the above example, the value 0xD8 is 11011000 in binary; mirror it (bits are in reverse) and you'll get 00011011, which is 0x1B in hex or 27 in decimal. That matches the sequence length.

However, I'm not making any more progress. Does this look like a standard compression algorithm? What do I try next?

Chyme answered 8/2, 2014 at 19:27 Comment(7)
It is impossible to reverse-engineer compression algorithm from watching its behaviour.Consuela
I think your best option is to contact François Lionet, the person listed as the programmer of AMOS Pro. He now works for Clickteam. He might be able to point you in the right direction.Ce
Just to add some point every bit will be reversed because of Motorola's Big Endian system. And it is so far away back to remember but AMOS Pro has a default memory compressor called SQUASH possibly that area could compressed by this algorithm. You may want to research onto that algorithm.Ulyssesumayyad
He already knows that it is the squash function. He has posted this question on a discussion forum today as well - eab.abime.net/showthread.php?p=937074 - and there he refers to it as the Squash function.Ce
And motorola big endian does not reverse the bits, only the bytes of multi-byte constructs, such as 32-bit numbers.Ce
There's also the source code for the unsquash function available, here - pianetaamiga.it/downloads/AMOSPro_Sources.zip - you can find the unsquash function on line 1061 and onwards in +header.s, unfortunately it is in assembly language, undocumented (except for rudimentary calling convention information), so it's going to be hard to figure out. Still, not impossible.Ce
The UnSquash function shouldn't be too hard to translate into some modern high level language. Which language do you want it to be? Anyway, after unsquashing, you will probably be left with the tokenized version of the code - i. e. not readily readable text. You will still need to interpret the tokens.Gear
C
12

As you've posted here, the compression function is called "squash", a function part of AMOS Pro.

As such, my advice would be to try one of the following lines of attack:

  • Reverse engineer the algorithm by analyzing its output: This is definitely not a viable option. You will only waste time.
  • Read, annotate, understand the source code of the unsquash function in AMOS Pro
  • Contact the author of AMOS Pro

Read the source code

The source code for AMOS Pro is apparently in the public domain now and can be found here:

http://www.pianetaamiga.it/downloads/AMOSPro_Sources.zip

It consists of 68000 assembly code and quite a few compiled object files.

The unsquash function can be found in the file +header.s on line 1061 and onwards. It is not documented, except for its entry register values, which is good at least. It doesn't appear to be a very large function so this might be worth a shot.

You will need to have, or obtain/learn, rudimentary 68000 machine code. It does not appear to call out to system libraries or anything and only seem to operate directly on memory, which would suggest this is actually doable (ie. understanding the code). Still, I've never written or read 68000 code in my life so what do I know.

Contact the author of AMOS Pro

The author of AMOS Pro is François Lionet, as is evident by the User Guide, he founded Clickteam in the mid-90s to make game- and multimedia-making software. He still seems to be situated in that company and according to forum posts from others looking into AMOS Pro he seems to be willing to answer email. Sadly I don't know his email but the Clickteam website above should give you a starting point.

Ce answered 8/2, 2014 at 20:7 Comment(3)
Thanks for finding the AMOS source! Digging through it now. It doesn't look too complex.Chyme
I have written a set of macros that emulate the 68000 and was able to create a working C version of the ATN/Imploder compressor from 68k source (so portable on Windows). So that's doable.Glandulous
The official URL for the original assembler source code of AMOS Basic can be found here: github.com/AOZ-Studio/AMOS-Professional-OfficialEinsteinium

© 2022 - 2024 — McMap. All rights reserved.