Convert python long/int to fixed size byte array
Asked Answered
W

10

62

I'm trying to implement RC4 and DH key exchange in python. Problem is that I have no idea about how to convert the python long/int from the key exchange to the byte array I need for the RC4 implementation. Is there a simple way to convert a long to the required length byte array?

Update: forgot to mention that the numbers I'm dealing with are 768 bit unsigned integers.

Wrinkly answered 4/1, 2012 at 17:11 Comment(1)
not sure if it'll help but check the struct module: docs.python.org/library/struct.htmlSwatow
C
21

I haven't done any benchmarks, but this recipe "works for me".

The short version: use '%x' % val, then unhexlify the result. The devil is in the details, though, as unhexlify requires an even number of hex digits, which %x doesn't guarantee. See the docstring, and the liberal inline comments for details.

from binascii import unhexlify

def long_to_bytes (val, endianness='big'):
    """
    Use :ref:`string formatting` and :func:`~binascii.unhexlify` to
    convert ``val``, a :func:`long`, to a byte :func:`str`.

    :param long val: The value to pack

    :param str endianness: The endianness of the result. ``'big'`` for
      big-endian, ``'little'`` for little-endian.

    If you want byte- and word-ordering to differ, you're on your own.

    Using :ref:`string formatting` lets us use Python's C innards.
    """

    # one (1) hex digit per four (4) bits
    width = val.bit_length()

    # unhexlify wants an even multiple of eight (8) bits, but we don't
    # want more digits than we need (hence the ternary-ish 'or')
    width += 8 - ((width % 8) or 8)

    # format width specifier: four (4) bits per hex digit
    fmt = '%%0%dx' % (width // 4)

    # prepend zero (0) to the width, to zero-pad the output
    s = unhexlify(fmt % val)

    if endianness == 'little':
        # see https://mcmap.net/q/45142/-how-do-i-reverse-a-string-in-python
        s = s[::-1]

    return s

...and my nosetest unit tests ;-)

class TestHelpers (object):
    def test_long_to_bytes_big_endian_small_even (self):
        s = long_to_bytes(0x42)
        assert s == '\x42'

        s = long_to_bytes(0xFF)
        assert s == '\xff'

    def test_long_to_bytes_big_endian_small_odd (self):
        s = long_to_bytes(0x1FF)
        assert s == '\x01\xff'

        s = long_to_bytes(0x201FF)
        assert s == '\x02\x01\xff'

    def test_long_to_bytes_big_endian_large_even (self):
        s = long_to_bytes(0xab23456c8901234567)
        assert s == '\xab\x23\x45\x6c\x89\x01\x23\x45\x67'

    def test_long_to_bytes_big_endian_large_odd (self):
        s = long_to_bytes(0x12345678901234567)
        assert s == '\x01\x23\x45\x67\x89\x01\x23\x45\x67'

    def test_long_to_bytes_little_endian_small_even (self):
        s = long_to_bytes(0x42, 'little')
        assert s == '\x42'

        s = long_to_bytes(0xFF, 'little')
        assert s == '\xff'

    def test_long_to_bytes_little_endian_small_odd (self):
        s = long_to_bytes(0x1FF, 'little')
        assert s == '\xff\x01'

        s = long_to_bytes(0x201FF, 'little')
        assert s == '\xff\x01\x02'

    def test_long_to_bytes_little_endian_large_even (self):
        s = long_to_bytes(0xab23456c8901234567, 'little')
        assert s == '\x67\x45\x23\x01\x89\x6c\x45\x23\xab'

    def test_long_to_bytes_little_endian_large_odd (self):
        s = long_to_bytes(0x12345678901234567, 'little')
        assert s == '\x67\x45\x23\x01\x89\x67\x45\x23\x01'
Cascio answered 25/1, 2013 at 17:18 Comment(2)
I encountered problems when the value is 0 (Python 3.5) binascii.Error: Odd-length string, quick fix for this: replace s = unhexlify(fmt % val) with s = unhexlify('00') if fmt % val == '0' else unhexlify(fmt % val)Bismuthinite
This is more concise. pastebin.com/iQRXyxsMKnoxville
N
65

With Python 3.2 and later, you can use int.to_bytes and int.from_bytes: https://docs.python.org/3/library/stdtypes.html#int.to_bytes

Nettie answered 20/1, 2015 at 23:39 Comment(1)
It OverflowError's out on big numbers.Knoxville
C
33

Everyone has overcomplicated this answer:

some_int = <256 bit integer>
some_bytes = some_int.to_bytes(32, sys.byteorder)
my_bytearray = bytearray(some_bytes)

You just need to know the number of bytes that you are trying to convert. In my use cases, normally I only use this large of numbers for crypto, and at that point I have to worry about modulus and what-not, so I don't think this is a big problem to be required to know the max number of bytes to return.

Since you are doing it as 768-bit math, then instead of 32 as the argument it would be 96.

Concentrate answered 14/8, 2015 at 15:22 Comment(3)
In Python 3 this solution worked really well for 2048 bit integer. It Python 2.7 it works only for int (2048 bit integer is long in Python 2.7).Conal
In Python 2.7 some_bytes = some_int.to_bytes(32, sys.byteorder) produces error AttributeError: 'int' object has no attribute 'to_bytes' 😞Indic
Not quite everyone... see the answer by @JackOConnorLinotype
C
21

I haven't done any benchmarks, but this recipe "works for me".

The short version: use '%x' % val, then unhexlify the result. The devil is in the details, though, as unhexlify requires an even number of hex digits, which %x doesn't guarantee. See the docstring, and the liberal inline comments for details.

from binascii import unhexlify

def long_to_bytes (val, endianness='big'):
    """
    Use :ref:`string formatting` and :func:`~binascii.unhexlify` to
    convert ``val``, a :func:`long`, to a byte :func:`str`.

    :param long val: The value to pack

    :param str endianness: The endianness of the result. ``'big'`` for
      big-endian, ``'little'`` for little-endian.

    If you want byte- and word-ordering to differ, you're on your own.

    Using :ref:`string formatting` lets us use Python's C innards.
    """

    # one (1) hex digit per four (4) bits
    width = val.bit_length()

    # unhexlify wants an even multiple of eight (8) bits, but we don't
    # want more digits than we need (hence the ternary-ish 'or')
    width += 8 - ((width % 8) or 8)

    # format width specifier: four (4) bits per hex digit
    fmt = '%%0%dx' % (width // 4)

    # prepend zero (0) to the width, to zero-pad the output
    s = unhexlify(fmt % val)

    if endianness == 'little':
        # see https://mcmap.net/q/45142/-how-do-i-reverse-a-string-in-python
        s = s[::-1]

    return s

...and my nosetest unit tests ;-)

class TestHelpers (object):
    def test_long_to_bytes_big_endian_small_even (self):
        s = long_to_bytes(0x42)
        assert s == '\x42'

        s = long_to_bytes(0xFF)
        assert s == '\xff'

    def test_long_to_bytes_big_endian_small_odd (self):
        s = long_to_bytes(0x1FF)
        assert s == '\x01\xff'

        s = long_to_bytes(0x201FF)
        assert s == '\x02\x01\xff'

    def test_long_to_bytes_big_endian_large_even (self):
        s = long_to_bytes(0xab23456c8901234567)
        assert s == '\xab\x23\x45\x6c\x89\x01\x23\x45\x67'

    def test_long_to_bytes_big_endian_large_odd (self):
        s = long_to_bytes(0x12345678901234567)
        assert s == '\x01\x23\x45\x67\x89\x01\x23\x45\x67'

    def test_long_to_bytes_little_endian_small_even (self):
        s = long_to_bytes(0x42, 'little')
        assert s == '\x42'

        s = long_to_bytes(0xFF, 'little')
        assert s == '\xff'

    def test_long_to_bytes_little_endian_small_odd (self):
        s = long_to_bytes(0x1FF, 'little')
        assert s == '\xff\x01'

        s = long_to_bytes(0x201FF, 'little')
        assert s == '\xff\x01\x02'

    def test_long_to_bytes_little_endian_large_even (self):
        s = long_to_bytes(0xab23456c8901234567, 'little')
        assert s == '\x67\x45\x23\x01\x89\x6c\x45\x23\xab'

    def test_long_to_bytes_little_endian_large_odd (self):
        s = long_to_bytes(0x12345678901234567, 'little')
        assert s == '\x67\x45\x23\x01\x89\x67\x45\x23\x01'
Cascio answered 25/1, 2013 at 17:18 Comment(2)
I encountered problems when the value is 0 (Python 3.5) binascii.Error: Odd-length string, quick fix for this: replace s = unhexlify(fmt % val) with s = unhexlify('00') if fmt % val == '0' else unhexlify(fmt % val)Bismuthinite
This is more concise. pastebin.com/iQRXyxsMKnoxville
B
14

One-liner:

bytearray.fromhex('{:0192x}'.format(big_int))

The 192 is 768 / 4, because OP wanted 768-bit numbers and there are 4 bits in a hex digit. If you need a bigger bytearray use a format string with a higher number. Example:

>>> big_int = 911085911092802609795174074963333909087482261102921406113936886764014693975052768158290106460018649707059449553895568111944093294751504971131180816868149233377773327312327573120920667381269572962606994373889233844814776702037586419
>>> bytearray.fromhex('{:0192x}'.format(big_int))
bytearray(b'\x96;h^\xdbJ\x8f3obL\x9c\xc2\xb0-\x9e\xa4Sj-\xf6i\xc1\x9e\x97\x94\x85M\x1d\x93\x10\\\x81\xc2\x89\xcd\xe0a\xc0D\x81v\xdf\xed\xa9\xc1\x83p\xdbU\xf1\xd0\xfeR)\xce\x07\xdepM\x88\xcc\x7fv\\\x1c\x8di\x87N\x00\x8d\xa8\xbd[<\xdf\xaf\x13z:H\xed\xc2)\xa4\x1e\x0f\xa7\x92\xa7\xc6\x16\x86\xf1\xf3')
>>> lepi_int = 0x963b685edb4a8f336f624c9cc2b02d9ea4536a2df669c19e9794854d1d93105c81c289cde061c0448176dfeda9c18370db55f1d0fe5229ce07de704d88cc7f765c1c8d69874e008da8bd5b3cdfaf137a3a48edc229a41e0fa792a7c61686f1f
>>> bytearray.fromhex('{:0192x}'.format(lepi_int))
bytearray(b'\tc\xb6\x85\xed\xb4\xa8\xf36\xf6$\xc9\xcc+\x02\xd9\xeaE6\xa2\xdff\x9c\x19\xe9yHT\xd1\xd91\x05\xc8\x1c(\x9c\xde\x06\x1c\x04H\x17m\xfe\xda\x9c\x187\r\xb5_\x1d\x0f\xe5"\x9c\xe0}\xe7\x04\xd8\x8c\xc7\xf7e\xc1\xc8\xd6\x98t\xe0\x08\xda\x8b\xd5\xb3\xcd\xfa\xf17\xa3\xa4\x8e\xdc"\x9aA\xe0\xfay*|aho\x1f')

[My answer had used hex() before. I corrected it with format() in order to handle ints with odd-sized byte expressions. This fixes previous complaints about ValueError.]

Bison answered 30/7, 2013 at 14:28 Comment(9)
it does not work if you don't produce a Long though. I think smt like bytearray.fromhex(hex(2**61-1).strip('0x').strip('L')) is saferSyndetic
@MarioAlemi the code in your comment is wrong. strip('0x') will also strip the trailing zeros, which will result bad result (and sometimes ValueError)!Jugal
@Jess Austin: Your solution is totally wrong, because it works only when x consists of even number of hex-digits. Example: x=0x963b685edb4a8f336f624c9cc2b02d9ea4536a2df669c19e9794854d1d93105c81c289cde061c0448176dfeda9c18370db55f1d0fe5229ce07de704d88cc7f765c1c8d69874e008da8bd5b3cdfaf137a3a48edc229a41e0fa792a7c61686f1fLJugal
@lepi can you make an example?Syndetic
@MarioAlemi bytearray.fromhex(hex(0x11000000).strip('0x').strip('L')) It won't just strip the '0x' character sequence from the beginning, it will remove all the '0' and all the 'x' characters from both side. When the number is not a Long and has tailing zeros, those will be also removed.Jugal
@lepi thanks, I learned smt about strip() I did not know! Smt like that should work for the example, not sure there are other cases... bytearray.fromhex(hex(0x11000000).lstrip('0x').strip('L'))Syndetic
This is a good solution for Python 2. It is important to have an even number of characters (like the 192 in the example), or a ValueError is raised. (It was mentioned before as a problem with the previous solution, but it is still something to watch out for).Sondrasone
@Sondrasone the current solution still works no matter what the size of the argument to format(), so long as it fits in the space specified by the format expression string. That wasn't the case with the previous solution. fromhex() can't handle weird expressions so the format expression string should not be weird. One wouldn't expect that string would ever be dynamic, so that should be fine.Bison
Right, I was just trying to point out that if you change the 192 in the format string to 191 (or to any odd number), you will get a ValueError. Just something that tripped me up.Sondrasone
S
8

long/int to the byte array looks like exact purpose of struct.pack. For long integers that exceed 4(8) bytes, you can come up with something like the next:

>>> limit = 256*256*256*256 - 1
>>> i = 1234567890987654321
>>> parts = []
>>> while i:
        parts.append(i & limit)
        i >>= 32

>>> struct.pack('>' + 'L'*len(parts), *parts )
'\xb1l\x1c\xb1\x11"\x10\xf4'

>>> struct.unpack('>LL', '\xb1l\x1c\xb1\x11"\x10\xf4')
(2976652465L, 287445236)
>>> (287445236L << 32) + 2976652465L
1234567890987654321L
Saunderson answered 4/1, 2012 at 17:33 Comment(2)
But it won't help with big numbers (> 8 bytes), which will usually be used for cryptographic applications.Hygrograph
it's written not to be generic but more like fixed size solution to common problem of representing all possible ip's or similar...Idolatrize
S
7

You can try using struct:

import struct
struct.pack('L',longvalue)
Smalley answered 4/1, 2012 at 17:31 Comment(2)
Sadly no, error: integer out of range for 'L' format code. It's a 768 bit long, which is quite a bit bigger than the 4 byte unsigned int.Wrinkly
Downvoted because Python long int are arbitrarily long integers. Think of it like an array of 32 (or whatever) bits integers. A C long is a size defined datatype. With this response, you are confusing both.Scheel
T
7

Little-endian, reverse the result or the range if you want Big-endian:

def int_to_bytes(val, num_bytes):
    return [(val & (0xff << pos*8)) >> pos*8 for pos in range(num_bytes)]

Big-endian:

def int_to_bytes(val, num_bytes):
    return [(val & (0xff << pos*8)) >> pos*8 for pos in reversed(range(num_bytes))]
Tedman answered 30/1, 2013 at 20:32 Comment(0)
L
3

Basically what you need to do is convert the int/long into its base 256 representation -- i.e. a number whose "digits" range from 0-255. Here's a fairly efficient way to do something like that:

def base256_encode(n, minwidth=0): # int/long to byte array
    if n > 0:
        arr = []
        while n:
            n, rem = divmod(n, 256)
            arr.append(rem)
        b = bytearray(reversed(arr))
    elif n == 0:
        b = bytearray(b'\x00')
    else:
        raise ValueError

    if minwidth > 0 and len(b) < minwidth: # zero padding needed?
        b = (minwidth-len(b)) * '\x00' + b
    return b

You many not need thereversed()call depending on the endian-ness desired (doing so would require the padding to be done differently as well). Also note that as written it doesn't handle negative numbers.

You might also want to take a look at the similar but highly optimized long_to_bytes() function in thenumber.pymodule which is part of the open source Python Cryptography Toolkit. It actually converts the number into a string, not a byte array, but that's a minor issue.

Libel answered 4/1, 2012 at 21:32 Comment(0)
S
2

Python 2.7 does not implement the int.to- very slow_bytes() method.

I tried 3 methods:

  1. hex unpack/pack : very slow
  2. byte shifting 8 bits at a time: significantly faster.
  3. using a "C" module and packing into the lower (7 ia64 or 3 i32) bytes. This was about twice as fast as 2/ . It is the fastest option, but still too slow.

All these methods are very inefficient for two reasons:

  • Python 2.7 does not support this useful operation.
  • c does not support extended precision arithmetic using the carry/borrow/overflow flags available on most platforms.
Socialminded answered 20/10, 2015 at 0:56 Comment(0)
R
0
i = 0x12345678
s = struct.pack('<I',i)
b = struct.unpack('BBBB',s)
Rattletrap answered 14/11, 2017 at 12:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.