6502 lightweight compression algorithms
Asked Answered
F

1

11

I'm implementing virtual memory on dual cassette tape decks on a Commodore PET (for fun) for a Forth I'm writing. What I have so far is at http://github.com/chitselb/pettil if you're interested.

I'm planning to use the PET's native 192-byte cassette datafile format. Oh yeah, only 32K of RAM for everything. I have imbedded Woz's excellent and very memory-efficient Sweet-16 interpreter inside the language.

A Forth block is (typically) 1024 bytes. Adding two bytes for the block ID caps the available virtual address space at 64 meg, way more than what would fit on a tape. There would be a "play" deck (device 1) and a "record" deck (device 2) and the FLUSH would involve copying the entire virtual memory from one drive to the other. Why tilt at windmills? Because back in the day, cassette tape is what most PET owners had, self included.

Most of the data will be screens of Forth code, which in this implementation will be 1000 bytes of text and a 24-byte line-wrap table, since I'm leveraging the PET ROM screen editor too. What I'm looking for are suggestions for anything that will (probably) beat simple Run Length Encoding for this purpose, but without the CPU and memory overhead of something as complex as Lempel-Ziv. All suggestions other than "just forget it" are appreciated.

Freewill answered 13/9, 2013 at 14:5 Comment(6)
You could take a look at Huffman coding.Hormuz
I have used code.google.com/p/lzfx, for decompress only, and I think you still will use up most/all of that 32K so it probably isnt lightweight enough. but if you look at the concept not implementation, it is run length with some optimization. If that string has been encountered recently rather than it being a bunch of one byte of a, one byte of b, one byte of c it is one instruction to copy N bytes from address current-X.Midst
There is somewhere here at stackoverflow a question and answers for lightweight embedded compression. which had other good looking suggestions.Midst
@dwelch: you meant this one?Okechuku
You could use an encoded word with dictionary lookup if you have many strings of repeated sequences. It's simple to implement, fast, and works nicely for various kinds of text if it does consist of various, common string sequences. The idea is to look at the data and understand where the redundancies are and take advantage of that. Lempel-Ziv is "heavy" because it's designed to work well in the general case across various kinds of data. In this case, you have a very specific kind of data you may be able to take advantage of.Swell
How is the Forth being treated? It would seem to me that you can store the Forth dictionary and define non-comment text simply by reference to that — just like how most micros stored their BASIC to cassette in tokenised form.Aspirant
B
2

If it's the Forth source code you're most worried about, you could restrict the character set to the 48 Chuck Moore chose for colorForth and use his Shannon coding scheme which results in 5.2 bits per character on average. He also claims that colorForth source is only about twice the size of the object code. By the way, it seems that the character set is ever so slightly different in arrayForth (see pg. 47 of the User's Manual - different order for digits, apostrophe instead of colon, etc.).

Using the Shannon coding has nothing necessarily to do with colored words of course. If you wanted to go all the way and store pre-parsed words as in colorForth you could use his scheme here.

He doesn't give many details, but for etherForth he abandoned the Shannon coding and went with a simple 6-bit encoding for the same character set with 11xxxx additionally indicating a 16-bit tag which he uses for the colors and tokens including the F18 instructions and a few assembler primitives (begin, end, then, for). That is a very cool scheme indeed (and especially on the 18-bit F18 with room for 3 per word). Extremely simple and quite compact.

Anyway, there are some ideas. Not really a direct answer to your compression question, but some ways of storing Forth source in a well-compressed form. Have fun!

Burushaski answered 13/9, 2013 at 21:8 Comment(1)
The objective is a standard-feeling Forth that was designed specifically for Commodore PET hardware, taking advantage of the PETSCII character set and any routines in the built-in ROM wherever it can. It will run on PETs with 1.0, upgrade, or 4.0 BASIC ROM and optionally be target compiled to the (unused in a factory PET) $9000-$AFFF address spaceFreewill

© 2022 - 2024 — McMap. All rights reserved.