Lua, dealing with non-ascii byte streams, byteorder change
Asked Answered
W

3

6

Need to encode & decode byte-stream (containing non-ascii characters possibly), from/into uint16, uint32, uint64 (their typical C/C++ meaning), taking care of endianness. What is an efficient & hopefully cross-platform way to do such a thing in Lua ?

My target arch is 64-bit x86_64, but would like to keep it portable (if it doesn't cost me on performance front).

e.g.

decode (say currently in a Lua string) -- 0x00, 0x1d, 0xff, 0x23, 0x44, 0x32 (little endian) as - uint16: (0x1d00) = 7424 uint32: (0x324423ff) = 843326463

Would be great if someone can explain with an example.

Wulfila answered 9/3, 2011 at 5:36 Comment(0)
S
6

Take a look at the struct and lpack libraries.

In this example, I use the struct.unpack to decode a Lua string into two integers with forced little-endian encoding:

require 'struct'
-- convert character codes to a Lua string - this may come from your source
local str = string.char(0x00, 0x1d, 0xff, 0x23, 0x44, 0x32)
-- format string: < = little endian, In = unsigned int (n bytes)
local u16, u32 = struct.unpack('<I2I4', str)
print(u16, u32) --> 7424    843326463
Sumerian answered 9/3, 2011 at 12:9 Comment(4)
seems very simple and elegant, however I don't seem to have 'struct' extension module. luarocks fails building/installing it with error. Will try to solve that and try this. thanks!Wulfila
@michal-kottman, tried the code after fixing luarocks and installing 'struct' but for lua cribs about second parameter to unpack (i.e. str) not being string. to debug I tried this little code (which doesn't crib, but doesn't seem to work as expected either -- > str = string.char(0x00, 0xff) > local u16 = struct.unpack('<I2', str) > print(u16) nilWulfila
@michal-kottman, sorry! fixed it. slight change was needed in Lua 5.1 (at least on my system). just had to do a: struct = require("struct")Wulfila
Strange, I tested the code presented as-is in Lua 5.1, and it worked correctly (maybe I had an older version of struct), but I'm glad it works for you now, it's a very handy tool for binary data...Sumerian
A
7

for converting from bytes to int (taking care of endianness at byte level, and signedness):

function bytes_to_int(str,endian,signed) -- use length of string to determine 8,16,32,64 bits
    local t={str:byte(1,-1)}
    if endian=="big" then --reverse bytes
        local tt={}
        for k=1,#t do
            tt[#t-k+1]=t[k]
        end
        t=tt
    end
    local n=0
    for k=1,#t do
        n=n+t[k]*2^((k-1)*8)
    end
    if signed then
        n = (n > 2^(#t*8-1) -1) and (n - 2^(#t*8)) or n -- if last bit set, negative.
    end
    return n
end

And while we're at it also the other direction:

function int_to_bytes(num,endian,signed)
    if num<0 and not signed then num=-num print"warning, dropping sign from number converting to unsigned" end
    local res={}
    local n = math.ceil(select(2,math.frexp(num))/8) -- number of bytes to be used.
    if signed and num < 0 then
        num = num + 2^n
    end
    for k=n,1,-1 do -- 256 = 2^8 bits per char.
        local mul=2^(8*(k-1))
        res[k]=math.floor(num/mul)
        num=num-res[k]*mul
    end
    assert(num==0)
    if endian == "big" then
        local t={}
        for k=1,n do
            t[k]=res[n-k+1]
        end
        res=t
    end
    return string.char(unpack(res))
end

Any remarks are welcome, it's tested, but not too thoroughly...

Adley answered 9/3, 2011 at 10:18 Comment(6)
Very instructional. I seem to have learnt a load of Lua through your illustrative example, seriously.Wulfila
In function bytes_to_int, the line n = (n > 2^(#t-1) -1) and (n - 2^#t) or n I think It should be #t*8 instead or #t. Is the number of bits you want, not bytes.Overdrive
It seems you are right! Thanks for the suggestion, I've corrected it in my answer.Adley
@Adley - thanks a lot. Function int_to_bytes also needs correction when dealing with signed numbers: if signed and num < 0 then, next line 2^n should be num = num + 2^(8 * n)Garnettgarnette
int_to_bytes isn't always handling signed numbers correctly (when exp from frexp is multiple of 8). For example, -32769 should be 0xFF 0xFF 0x80 0x00, but the function as it is returns 0x7F 0xFF.Garnettgarnette
Also, for numbers exceeding 16 bits, int_to_bytes does not necessarily return an even number of bytes, so the result for signed numbers may not be portable to other applications (eg for signed number in Big Endian returning in 3 bytes, the leading byte is assumed to be 0x00 instead of 0xFF) . My solution for both problems lacks the concise elegance of the above code... maybe a real programmer can help out!Garnettgarnette
S
6

Take a look at the struct and lpack libraries.

In this example, I use the struct.unpack to decode a Lua string into two integers with forced little-endian encoding:

require 'struct'
-- convert character codes to a Lua string - this may come from your source
local str = string.char(0x00, 0x1d, 0xff, 0x23, 0x44, 0x32)
-- format string: < = little endian, In = unsigned int (n bytes)
local u16, u32 = struct.unpack('<I2I4', str)
print(u16, u32) --> 7424    843326463
Sumerian answered 9/3, 2011 at 12:9 Comment(4)
seems very simple and elegant, however I don't seem to have 'struct' extension module. luarocks fails building/installing it with error. Will try to solve that and try this. thanks!Wulfila
@michal-kottman, tried the code after fixing luarocks and installing 'struct' but for lua cribs about second parameter to unpack (i.e. str) not being string. to debug I tried this little code (which doesn't crib, but doesn't seem to work as expected either -- > str = string.char(0x00, 0xff) > local u16 = struct.unpack('<I2', str) > print(u16) nilWulfila
@michal-kottman, sorry! fixed it. slight change was needed in Lua 5.1 (at least on my system). just had to do a: struct = require("struct")Wulfila
Strange, I tested the code presented as-is in Lua 5.1, and it worked correctly (maybe I had an older version of struct), but I'm glad it works for you now, it's a very handy tool for binary data...Sumerian
C
1

my suggestion for an "Int16ToByte"-function without checking of parameters:

function Int16ToBytes(num, endian)
  if num < 0 then 
      num = num & 0xFFFF
  end

  highByte = (num & 0xFF00) >> 8
  lowByte  = num & 0xFF

  if endian == "little" then
      lowByte, highByte = highByte, lowByte
  end

  return string.char(highByte,lowByte)
end
Cavit answered 9/11, 2016 at 15:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.