Base91, how is it calculated?
Asked Answered
T

1

12

I've been looking online to find out how basE91 is calculated. I have found resources such as this one which specifies the characters used for a specific value but nowhere have I found how I get that value.

I have tried changing the input values into binary and taking chunks of both 6 and 7 bits but these do not work and I get the incorrect output. I do not want code that will do this for me as I which to write that myself, I only want to know the process needed to encode a string into basE91.

Timbuktu answered 27/10, 2017 at 15:5 Comment(3)
Did you download his source code to see how he does the calculation? A good understanding of base conversion encodings like base64 will help. iiis.org/CDs2010/CD2010SCI/CCCT_2010/PapersPdf/TB100QM.pdf has a pretty good description of the algorithm. An easily understood C# implementation is available at base91csharp.codeplex.com/SourceControl/latest#Base91.cs. There's this really cool invention called a "search engine" that lets you find all kinds of stuff really fast. I think it took me three minutes to find the above. Consider doing your own research next time.Verily
@JimMischel: and if you actually compare the algorithm in the pdf with the source code, you'll find that they don't match, the source code follows a little more sophisticated algorithm.Dramamine
Thanks for posting this question. The answer below gives a more readily helpful overview than the links in @JimMischel's comment.Skewback
D
14

First, you need to see the input as a bit stream.

Then, read 13 bits from the stream, and form an integer value from it. If the value of this integer is lower than or equal to 88, then read one additional bit, and put it into the 14th bit (lowest bit being 1st) of the integer. This integer's (let's call it v) maximum value is: 8192+88 = 8280.

Then split v into two indices: i0 = v%91, i1 = v/91. Then use a 91-element character table, and output two characters: table[i0], table[i1].

(now you can see the reason of 88: for the maximal value (8280), both i0 and i1 become 90)

So this process is more complicated than base64, but more space efficient. Furthermore, unlike base64, the size of the output is a little bit dependent of the input bytes. A N-length sequence of 0x00 will be shorter than a N-length sequence of 0xff (where N is a sufficiently large number).

Dramamine answered 28/10, 2017 at 15:19 Comment(3)
I assume you mean add in the extra bit as the 14th bit not into the 13th bit as otherwise you don't get 8280 you have the exact same number of represented values. Apart from that you completely cleared up my problem, this was the piece of the puzzle I could not find. Thanks a lot.Timbuktu
@milo.farrell: Yes. It depends, how do you call the lowest bit :) Usually, I use 0th for the lowest bit. I've clarified this in my answer.Dramamine
Isn't b->queue |= *ib++ << b->nbits reversing order of the bits in scope? or is this intentional for little endian encoding? The bitstream sequence 0xaa 0xbb will result in a b->queue of 0xbb 0xaa, isn't it?Gelinas

© 2022 - 2024 — McMap. All rights reserved.