Byte Array to Uint64 as a String
Asked Answered
D

4

7

Let's think about the following situation.

The Go routine creates a byte array where packs a Uint64 number 5577006791947779410 in 8 bytes Big Endian [77, 101, 130, 33, 7, 252, 253, 82].

In JavaScript code I receive these bytes as Uint8Array. We know that JavaScript doesn't currently support Uint64 as safe numeric type and cannot perform bitwise operations on integers larger than 32 bits, so things like buf[0] << 56 will never work.

So what is the process of decoding these bytes directly to numeric string "5577006791947779410"?

P.S. I know there are plenty of libraries for working with big integers in JavaScript, but generally they are huge and provide lots of mathematical operations, which I don't need here. I am looking for a simple modern straightforward solution for just decoding BE-packed Uint64 and Int64 bytes to numeric string. Do you have anything in mind?

Disembarrass answered 2/8, 2017 at 8:51 Comment(0)
C
10

EDIT: For converting (U)int64 I would now definitely recommend @LS_DEV's solution. I would use my solution only when having an unknown or larger amount of bytes.

I started with https://mcmap.net/q/45796/-convert-a-large-integer-to-a-hex-string-in-javascript and modified it:

function Int64ToString(bytes, isSigned) {
  const isNegative = isSigned && bytes.length > 0 && bytes[0] >= 0x80;
  const digits = [];
  bytes.forEach((byte, j) => {
    if(isNegative)
      byte = 0x100 - (j == bytes.length - 1 ? 0 : 1) - byte;
    for(let i = 0; byte > 0 || i < digits.length; i++) {
      byte += (digits[i] || 0) * 0x100;
      digits[i] = byte % 10;
      byte = (byte - digits[i]) / 10;
    }
  });
  return (isNegative ? '-' : '') + digits.reverse().join('');
}

const tests = [
  {
    inp: [77, 101, 130, 33, 7, 252, 253, 82],
    signed: false,
    expectation: '5577006791947779410'
  },
  {
    inp: [255, 255, 255, 255, 255, 255, 255, 255],
    signed: true,
    expectation: '-1'
  },
];

tests.forEach(test => {
  const result = Int64ToString(test.inp, test.signed);
  console.log(`${result} ${result !== test.expectation ? '!' : ''}=== ${test.expectation}`);
});

At first the sign gets calculated by checking if the topmost bit is set (bytes[0] > 128). For negative numbers the bits have to be negated (255 - byte) and 1 has to be added to the number (therefore 256 instead of 255 for the last byte).

The basic idea of the forEach loop is to split each byte into its decimal digits (byte % 10 and calculating the overhead (byte - digits[i]) / 10 resp. Math.floor(byte / 10) for the next digit). For the next byte one has to add the shifted result of the last bytes' digits (byte += digits[i] * 256 resp. digits[i] << 8).

That code is optimized for shortness, simplicity and flexibility. If you are working with strings instead of bytes or numbers and don't want to use any libraries it appears that conversion performance doesn't really matter. Otherwise the function could be optimized for performance: Up to four bytes could be treated simultaneously, one only has to replace the 0x100 and 0x80, additionally (with only two byte groups remaining in the case of an (U)Int64) the forEach loop can be unrolled. Grouping the decimal digits probably won't increase performance since the resulting strings would have to be padded with zeros, introducing the need of removing leading zeros in the end result.

Cherisecherish answered 4/8, 2017 at 11:33 Comment(4)
Thank you for the reply! That seems to work well for uint64. What modifications should I need to do make in order this to work with int64 as well? I have created a playground here: jsfiddle.net/wcqLj1qg.Disembarrass
I have updated my answer and uploaded it to codepen.io/stephtr/pen/brBvxr The modifications necessary are (obviously) adding the minus sign, negating the bits and substracting 1.Cherisecherish
Could make it slightly more compact by changing byte > 0 to byte since byte is always positive. Also j == bytes.length - 1 ? 0 : 1 can be written simply j != bytes.length - 1 since the boolean will get coerced into a number :)Wavelength
I thought about that when writing the code, but in my opinion that decreases readability.Cherisecherish
M
5

Another approach: divide problem in two uint32 to keep calculations manageable.

Consider lower and higher uint32 (l and h). Full number could be written as h*0x100000000+l. Considering decimal, one could also consider lower 9 digits and remaining higher digits (ld and hd): ld=(h*0x100000000+l)%1000000000 and hd=(h*0x100000000+l)/1000000000. With some arithmetic and algebra operators properties, one can break those operation into safe "half" 64bit operations and compose string at ending.

function int64_to_str(a, signed) {
  const negative = signed && a[0] >= 128;
  const H = 0x100000000, D = 1000000000;
  let h = a[3] + a[2] * 0x100 + a[1] * 0x10000 + a[0]*0x1000000;
  let l = a[7] + a[6] * 0x100 + a[5] * 0x10000 + a[4]*0x1000000;
  if(negative) {
     h = H - 1 - h;
     l = H - l;
  }
  const hd = Math.floor(h * H / D + l / D);
  const ld = (((h % D) * (H % D)) % D + l) % D;
  const ldStr = ld + '';
  return (negative ? '-' : '') +
         (hd != 0 ? hd + '0'.repeat(9 - ldStr.length) : '') + ldStr;
}

let result = int64_to_str([77, 101, 130, 33, 7, 252, 253, 82], false);
let expectation = '5577006791947779410';
console.log(result + ' ' + (result === expectation ? '===' : '!==') + ' ' + expectation);

result = int64_to_str([255, 255, 255, 255, 255, 255, 255, 255], true);
expectation = '-1';
console.log(result + ' ' + (result === expectation ? '===' : '!==') + ' ' + expectation);

As detailed in the comments that algorithm works even though (h % D) * (H % D) can get larger than Number.MAX_SAFE_INTEGER, because the lost bits were nevertheless zero.

Medor answered 11/8, 2017 at 9:14 Comment(8)
Even though that would be the better approach for dealing with fixed size ints it won't work correctly in some cases. I think that the Math.trunc(ld/D) part should be dropped since the first summand mathematically already contains that part. With that corrected I am still not sure if it works correctly. h*H can result in numbers way above Number.MAX_SAFE_INTEGER. By dividing that with D and truncating, the loss of precision should vanish. However, I am more worried about (h%D)*(H%D), since D*(H%D) is also too large, this time with the full precision being needed.Cherisecherish
@stephan (h%D)*(H%D), that's a good point! I could use 3 uint22, but not today...Noella
I experimented a little bit and it seems like the error introduced in the multiplication gets luckily compensated by the error introduced when calculating the modulus (since D AND H are divisible by 0x800). There is no need to split it into three groups. If you don't want to rely on that behavior you could use ((((h%D)*((H%D)/0x800))%D)*0x800)%D instead of (h%D)*(H%D)%D (because D*(H%D)/0x800 is smaller than Number.MAX_SAFE_INTEGER).Cherisecherish
To be more specific the last 11 bits (thus 0x800) of D and therefore H%D are already zero, therefore there doesn't get any error introduced by implicitly setting them zero because of the "precision loss". Just another side note: When hd is zero leading zeros shouldn't be appended.Cherisecherish
I updated your code to include signed int support and a the fix for hd: codepen.io/stephtr/pen/EvXqop?editors=0010Cherisecherish
@Cherisecherish nice work. If you wish, you're welcome to post it as an answer. You spent much more effort than me :-)Noella
I updated your answer, but the edit got somehow declined. Feel free to copy it ;)Cherisecherish
Thank you for the reply! I wish I could give you all more upvotes ;)Disembarrass
W
1

Here is my solution. The general strategy is this:

  • If number is negative, negate it using 2's complement and add negative sign back in at the end
  • Represent arbitrary size numbers as LE arrays of digits from 0 to 9
  • For each byte in the Uint8Array (from most to least significant), multiply running total by 256 and add to it the value of the new byte
  • To multiply a number by 256, double it 8 times (since 2 ** 8 == 256)
  • To add two numbers, use the elementary school algorithm:
    • Start with least significant digit
    • Add corresponding digits of the two numbers
    • Resulting digit is the sum mod 10; carry is 1 if the sum is 10 or more, otherwise 0
    • Continue adding corresponding digits with the carry until we add the most significant digits and carry is 0

A few notes about shorthand:

  • n1[i] || 0 gets the ith digit of n1. If this is past the end of i, we treat it as a 0 (imagine numbers represented with infinite 0s in front of them). Same with n2.
  • added > 9 produces a boolean, which is automatically converted to a number (1 if added >= 10, 0 otherwise)
  • i < n1.length || i < n2.length || carry checks whether there are more digits in either of the addends or the carry is still nonzero
  • String(b).split('').map(Number).reverse() converts, e.g. 100 to '100', then ['1', '0', '0'], then [1, 0, 0], then [0, 0, 1] so it is represented in LE base-10
  • result.reverse().join('') converts, e.g. [0, 0, 1] to [1, 0, 0], then '100'

Code:

function add(n1, n2) {
    const sum = []
    let carry = 0
    for (let i = 0; i < n1.length || i < n2.length || carry; i++) {
        const added = (n1[i] || 0) + (n2[i] || 0) + carry
        sum[i] = added % 10
        carry = added > 9 //floor(added / 10)
    }
    return sum
}
function times256(n1) {
    for (let i = 8; i; i--) n1 = add(n1, n1)
    return n1
}
function toString(buffer) {
    const isNegative = buffer[0] & 128 //check if high bit is set
    if (isNegative) { //convert to positive, using 2's complement
        buffer = buffer.map(b => ~b) //invert all bits
        let i = buffer.length - 1
        while (buffer[i] === 255) { //add 1 to the number, carrying if necessary
            buffer[i] = 0
            i--
        }
        buffer[i]++
    }
    const result = buffer.reduce((sum, b) =>
        add(
            times256(sum), //multiply sum by 256
            String(b).split('').map(Number).reverse() //then add b
        ),
        []
    )
    const stringResult = result.reverse().join('')
    if (isNegative) return '-' + stringResult
    else return stringResult
}
Wavelength answered 4/8, 2017 at 20:3 Comment(2)
Thank you very much for the reply. Your code and explanation is simply great. Now I wonder which solution is be better: your or Stephan's. His solution is shorter and contains only 2 loops, your strategy is more detailed and clear. We probably need some perf check here.Disembarrass
Yeah, I'm sure his is probably faster, but I found this easier to conceptualize.Wavelength
C
1

This does the UInt64 version - I can't imagine that an interchange is that difficult:

<!DOCTYPE html>
<html>

<body>
<span id='out1'></span>
<br>
<span id='out2'></span>
<br>
<span id='out3'></span>
</body>

<script>
fnl='';
be=[77, 101, 130, 33, 7, 252, 253, 82];

function paddedBinary(n) {
pad='';
sv=128;
while (sv>n) {pad+='0';sv/=2;}
return pad+n.toString(2);
}

for (let i=0;i<8;i++)
fnl+=paddedBinary(be[i]);

out1.textContent=fnl;

dec=new Array(64);
for (let i=0;i<64;i++) dec[i]=new Array(21).fill(0);

function make2s() {
dec[0][0]=1;
for (let i=1;i<64;i++) {
for (let j=0;j<21;j++) 
dec[i][j]=2*dec[i-1][j];
for (let j=0;j<21;j++) 
if (dec[i][j]>9) {
dec[i][j]-=10;
dec[i][j+1]++;
}
}
}

function int64add(v1,v2) {
var res=new Array(21).fill(0);
for (let i=0;i<21;i++)
res[i]=v1[i]+v2[i];
for (let i=0;i<21;i++)
if (res[i]>9) {
res[i]-=10;
res[i+1]++;
}
return res;
}

make2s();
for (let i=0;i<64;i++)
out2.textContent+=dec[i]+' :: ';

cv=new Array(21).fill(0);
for (let i=0;i<fnl.length;i++)
if (fnl[i]=='1') cv=int64add(cv,dec[63-i]);

out3.textContent=cv;

</script>
</html>

The paddedBinary() function returns a 'full' 8-bit binary number, so we can create 'fnl' as a 64-bit string of the BigEndian.

As JavaScript doesn't do full 64-bit arithmetic, I create the dec[] array to store each power of 2 as individual digits, by doubling each previous digit and smoothing the tens out.

Then all is left is to add the bits that we want, which uses a similar method to smooth out the tens.

(and the answer is given in reverse!)

Calamity answered 8/8, 2017 at 13:13 Comment(1)
Thank you for the reply!Disembarrass

© 2022 - 2024 — McMap. All rights reserved.