6502 lightweight compression algorithms

I'm implementing virtual memory on dual cassette tape decks on a Commodore PET (for fun) for a Forth I'm writing. What I have so far is at http://github.com/chitselb/pettil if you're interested.

I'm planning to use the PET's native 192-byte cassette datafile format. Oh yeah, only 32K of RAM for everything. I have imbedded Woz's excellent and very memory-efficient Sweet-16 interpreter inside the language.

A Forth block is (typically) 1024 bytes. Adding two bytes for the block ID caps the available virtual address space at 64 meg, way more than what would fit on a tape. There would be a "play" deck (device 1) and a "record" deck (device 2) and the FLUSH would involve copying the entire virtual memory from one drive to the other. Why tilt at windmills? Because back in the day, cassette tape is what most PET owners had, self included.

Most of the data will be screens of Forth code, which in this implementation will be 1000 bytes of text and a 24-byte line-wrap table, since I'm leveraging the PET ROM screen editor too. What I'm looking for are suggestions for anything that will (probably) beat simple Run Length Encoding for this purpose, but without the CPU and memory overhead of something as complex as Lempel-Ziv. All suggestions other than "just forget it" are appreciated.

If it's the Forth source code you're most worried about, you could restrict the character set to the 48 Chuck Moore chose for colorForth and use his Shannon coding scheme which results in 5.2 bits per character on average. He also claims that colorForth source is only about twice the size of the object code. By the way, it seems that the character set is ever so slightly different in arrayForth (see pg. 47 of the User's Manual - different order for digits, apostrophe instead of colon, etc.).

Using the Shannon coding has nothing necessarily to do with colored words of course. If you wanted to go all the way and store pre-parsed words as in colorForth you could use his scheme here.

He doesn't give many details, but for etherForth he abandoned the Shannon coding and went with a simple 6-bit encoding for the same character set with 11xxxx additionally indicating a 16-bit tag which he uses for the colors and tokens including the F18 instructions and a few assembler primitives (begin, end, then, for). That is a very cool scheme indeed (and especially on the 18-bit F18 with room for 3 per word). Extremely simple and quite compact.

Anyway, there are some ideas. Not really a direct answer to your compression question, but some ways of storing Forth source in a well-compressed form. Have fun!

Recommended topics

Hot tags