nested arithmetic form (NAF) :
p = 8 * (8 * (2 * a + b) + c) + d # "a" at MSB
p = a + (b + (c + d * 8) * 2) * 2 # "a" at LSB
nested shifting form (NSF) :
p = ( ( ( ( (a << 1) + b) << 3) + c) << 3) + d # "a" at MSB
p = ( ( ( ( (d << 3) + c) << 1) + b) << 1) + a # "a" at LSB
Personally I think NAF is much cleaner (even if not the fastest), because it keeps all the digit groups on the same side and all the scaling factors on the other regardless of endian-ness, all while minimizing the layers of nesting required, and total arithmetic ops.
"NAF" and "NSF" aren't formal terms - it's just a colloquial way of describing them
So packing it would be chr(p)
or just sprintf("%c", p)
as for decoding, using the "a" at MSB approach, it's basically splitting out a byte's octal codes back out to its components ("a" at LSB is the same decoding process in little-endian) :
2a+b c d a b
\ \/ \/
======================
62 \0 76 00
63 \0 77 00
64 \1 00 01
65 \1 01 01
126 \1 76 01
127 \1 77 01
128 \2 00 10
129 \2 01 10
190 \2 76 10
191 \2 77 10
192 \3 00 11
193 \3 01 11
254 \3 76 11
255 \3 77 11
0 \0 00 00
1 \0 01 00
2a+b
is really useful when it comes to encoding UTF-8
:
\0 ##
- all the ASCII
digits and arithmetic operators,
most of the ASCII
linguistic punctuation symbols, and
nearly all the ASCII
[[:cntrl:]]
bytes (sans \177 DEL
)
\1 ##
- all the ASCII
letters, plus misc [[:punct:]]
\2 ##
- trailing/"continuation" bytes for Unicode
code points U+0080 - U+10FFFFF
\3 ##
- all the leading byte for Unicode
code points U+0080 - U+10FFFFF
— sans \300 xC0
- \301 xC1
and \365 xF5
- \377 xFF