Convert python long/int to fixed size byte array

Asked 4/1, 2012 at 17:11 Answered 14/11, 2017 at 12:11

Solved python arrays long-integer diffie-hellman rc4-cipher

I'm trying to implement RC4 and DH key exchange in python. Problem is that I have no idea about how to convert the python long/int from the key exchange to the byte array I need for the RC4 implementation. Is there a simple way to convert a long to the required length byte array?

Update: forgot to mention that the numbers I'm dealing with are 768 bit unsigned integers.

Wrinkly answered 4/1, 2012 at 17:11 Comment(1)

not sure if it'll help but check the struct module: docs.python.org/library/struct.html – Swatow 4/1, 2012 at 17:34

I haven't done any benchmarks, but this recipe "works for me".

The short version: use '%x' % val, then unhexlify the result. The devil is in the details, though, as unhexlify requires an even number of hex digits, which %x doesn't guarantee. See the docstring, and the liberal inline comments for details.

from binascii import unhexlify

def long_to_bytes (val, endianness='big'):
    """
    Use :ref:`string formatting` and :func:`~binascii.unhexlify` to
    convert ``val``, a :func:`long`, to a byte :func:`str`.

    :param long val: The value to pack

    :param str endianness: The endianness of the result. ``'big'`` for
      big-endian, ``'little'`` for little-endian.

    If you want byte- and word-ordering to differ, you're on your own.

    Using :ref:`string formatting` lets us use Python's C innards.
    """

    # one (1) hex digit per four (4) bits
    width = val.bit_length()

    # unhexlify wants an even multiple of eight (8) bits, but we don't
    # want more digits than we need (hence the ternary-ish 'or')
    width += 8 - ((width % 8) or 8)

    # format width specifier: four (4) bits per hex digit
    fmt = '%%0%dx' % (width // 4)

    # prepend zero (0) to the width, to zero-pad the output
    s = unhexlify(fmt % val)

    if endianness == 'little':
        # see https://mcmap.net/q/45142/-how-do-i-reverse-a-string-in-python
        s = s[::-1]

    return s

...and my nosetest unit tests ;-)

class TestHelpers (object):
    def test_long_to_bytes_big_endian_small_even (self):
        s = long_to_bytes(0x42)
        assert s == '\x42'

        s = long_to_bytes(0xFF)
        assert s == '\xff'

    def test_long_to_bytes_big_endian_small_odd (self):
        s = long_to_bytes(0x1FF)
        assert s == '\x01\xff'

        s = long_to_bytes(0x201FF)
        assert s == '\x02\x01\xff'

    def test_long_to_bytes_big_endian_large_even (self):
        s = long_to_bytes(0xab23456c8901234567)
        assert s == '\xab\x23\x45\x6c\x89\x01\x23\x45\x67'

    def test_long_to_bytes_big_endian_large_odd (self):
        s = long_to_bytes(0x12345678901234567)
        assert s == '\x01\x23\x45\x67\x89\x01\x23\x45\x67'

    def test_long_to_bytes_little_endian_small_even (self):
        s = long_to_bytes(0x42, 'little')
        assert s == '\x42'

        s = long_to_bytes(0xFF, 'little')
        assert s == '\xff'

    def test_long_to_bytes_little_endian_small_odd (self):
        s = long_to_bytes(0x1FF, 'little')
        assert s == '\xff\x01'

        s = long_to_bytes(0x201FF, 'little')
        assert s == '\xff\x01\x02'

    def test_long_to_bytes_little_endian_large_even (self):
        s = long_to_bytes(0xab23456c8901234567, 'little')
        assert s == '\x67\x45\x23\x01\x89\x6c\x45\x23\xab'

    def test_long_to_bytes_little_endian_large_odd (self):
        s = long_to_bytes(0x12345678901234567, 'little')
        assert s == '\x67\x45\x23\x01\x89\x67\x45\x23\x01'

Cascio answered 25/1, 2013 at 17:18 Comment(2)

I encountered problems when the value is 0 (Python 3.5) binascii.Error: Odd-length string, quick fix for this: replace s = unhexlify(fmt % val) with s = unhexlify('00') if fmt % val == '0' else unhexlify(fmt % val) – Bismuthinite 30/11, 2016 at 14:5

This is more concise. pastebin.com/iQRXyxsM – Knoxville 25/7, 2020 at 7:46

With Python 3.2 and later, you can use int.to_bytes and int.from_bytes: https://docs.python.org/3/library/stdtypes.html#int.to_bytes

Nettie answered 20/1, 2015 at 23:39 Comment(1)

It OverflowError's out on big numbers. – Knoxville 25/7, 2020 at 7:45

Everyone has overcomplicated this answer:

some_int = <256 bit integer>
some_bytes = some_int.to_bytes(32, sys.byteorder)
my_bytearray = bytearray(some_bytes)

You just need to know the number of bytes that you are trying to convert. In my use cases, normally I only use this large of numbers for crypto, and at that point I have to worry about modulus and what-not, so I don't think this is a big problem to be required to know the max number of bytes to return.

Since you are doing it as 768-bit math, then instead of 32 as the argument it would be 96.

Concentrate answered 14/8, 2015 at 15:22 Comment(3)

In Python 3 this solution worked really well for 2048 bit integer. It Python 2.7 it works only for int (2048 bit integer is long in Python 2.7). – Conal 21/5, 2016 at 13:50

In Python 2.7 some_bytes = some_int.to_bytes(32, sys.byteorder) produces error AttributeError: 'int' object has no attribute 'to_bytes' 😞 – Indic 9/8, 2017 at 13:7

Not quite everyone... see the answer by @JackOConnor – Linotype 19/2, 2020 at 23:11

I haven't done any benchmarks, but this recipe "works for me".

from binascii import unhexlify

def long_to_bytes (val, endianness='big'):
    """
    Use :ref:`string formatting` and :func:`~binascii.unhexlify` to
    convert ``val``, a :func:`long`, to a byte :func:`str`.

    :param long val: The value to pack

    :param str endianness: The endianness of the result. ``'big'`` for
      big-endian, ``'little'`` for little-endian.

    If you want byte- and word-ordering to differ, you're on your own.

    Using :ref:`string formatting` lets us use Python's C innards.
    """

    # one (1) hex digit per four (4) bits
    width = val.bit_length()

    # unhexlify wants an even multiple of eight (8) bits, but we don't
    # want more digits than we need (hence the ternary-ish 'or')
    width += 8 - ((width % 8) or 8)

    # format width specifier: four (4) bits per hex digit
    fmt = '%%0%dx' % (width // 4)

    # prepend zero (0) to the width, to zero-pad the output
    s = unhexlify(fmt % val)

    if endianness == 'little':
        # see https://mcmap.net/q/45142/-how-do-i-reverse-a-string-in-python
        s = s[::-1]

    return s

...and my nosetest unit tests ;-)

class TestHelpers (object):
    def test_long_to_bytes_big_endian_small_even (self):
        s = long_to_bytes(0x42)
        assert s == '\x42'

        s = long_to_bytes(0xFF)
        assert s == '\xff'

    def test_long_to_bytes_big_endian_small_odd (self):
        s = long_to_bytes(0x1FF)
        assert s == '\x01\xff'

        s = long_to_bytes(0x201FF)
        assert s == '\x02\x01\xff'

    def test_long_to_bytes_big_endian_large_even (self):
        s = long_to_bytes(0xab23456c8901234567)
        assert s == '\xab\x23\x45\x6c\x89\x01\x23\x45\x67'

    def test_long_to_bytes_big_endian_large_odd (self):
        s = long_to_bytes(0x12345678901234567)
        assert s == '\x01\x23\x45\x67\x89\x01\x23\x45\x67'

    def test_long_to_bytes_little_endian_small_even (self):
        s = long_to_bytes(0x42, 'little')
        assert s == '\x42'

        s = long_to_bytes(0xFF, 'little')
        assert s == '\xff'

    def test_long_to_bytes_little_endian_small_odd (self):
        s = long_to_bytes(0x1FF, 'little')
        assert s == '\xff\x01'

        s = long_to_bytes(0x201FF, 'little')
        assert s == '\xff\x01\x02'

    def test_long_to_bytes_little_endian_large_even (self):
        s = long_to_bytes(0xab23456c8901234567, 'little')
        assert s == '\x67\x45\x23\x01\x89\x6c\x45\x23\xab'

    def test_long_to_bytes_little_endian_large_odd (self):
        s = long_to_bytes(0x12345678901234567, 'little')
        assert s == '\x67\x45\x23\x01\x89\x67\x45\x23\x01'

Cascio answered 25/1, 2013 at 17:18 Comment(2)

This is more concise. pastebin.com/iQRXyxsM – Knoxville 25/7, 2020 at 7:46

One-liner:

bytearray.fromhex('{:0192x}'.format(big_int))

The 192 is 768 / 4, because OP wanted 768-bit numbers and there are 4 bits in a hex digit. If you need a bigger bytearray use a format string with a higher number. Example:

>>> big_int = 911085911092802609795174074963333909087482261102921406113936886764014693975052768158290106460018649707059449553895568111944093294751504971131180816868149233377773327312327573120920667381269572962606994373889233844814776702037586419
>>> bytearray.fromhex('{:0192x}'.format(big_int))
bytearray(b'\x96;h^\xdbJ\x8f3obL\x9c\xc2\xb0-\x9e\xa4Sj-\xf6i\xc1\x9e\x97\x94\x85M\x1d\x93\x10\\\x81\xc2\x89\xcd\xe0a\xc0D\x81v\xdf\xed\xa9\xc1\x83p\xdbU\xf1\xd0\xfeR)\xce\x07\xdepM\x88\xcc\x7fv\\\x1c\x8di\x87N\x00\x8d\xa8\xbd[<\xdf\xaf\x13z:H\xed\xc2)\xa4\x1e\x0f\xa7\x92\xa7\xc6\x16\x86\xf1\xf3')
>>> lepi_int = 0x963b685edb4a8f336f624c9cc2b02d9ea4536a2df669c19e9794854d1d93105c81c289cde061c0448176dfeda9c18370db55f1d0fe5229ce07de704d88cc7f765c1c8d69874e008da8bd5b3cdfaf137a3a48edc229a41e0fa792a7c61686f1f
>>> bytearray.fromhex('{:0192x}'.format(lepi_int))
bytearray(b'\tc\xb6\x85\xed\xb4\xa8\xf36\xf6$\xc9\xcc+\x02\xd9\xeaE6\xa2\xdff\x9c\x19\xe9yHT\xd1\xd91\x05\xc8\x1c(\x9c\xde\x06\x1c\x04H\x17m\xfe\xda\x9c\x187\r\xb5_\x1d\x0f\xe5"\x9c\xe0}\xe7\x04\xd8\x8c\xc7\xf7e\xc1\xc8\xd6\x98t\xe0\x08\xda\x8b\xd5\xb3\xcd\xfa\xf17\xa3\xa4\x8e\xdc"\x9aA\xe0\xfay*|aho\x1f')

[My answer had used hex() before. I corrected it with format() in order to handle ints with odd-sized byte expressions. This fixes previous complaints about ValueError.]

Bison answered 30/7, 2013 at 14:28 Comment(9)

it does not work if you don't produce a Long though. I think smt like bytearray.fromhex(hex(2**61-1).strip('0x').strip('L')) is safer – Syndetic 7/7, 2014 at 9:30

@MarioAlemi the code in your comment is wrong. strip('0x') will also strip the trailing zeros, which will result bad result (and sometimes ValueError)! – Jugal 5/12, 2014 at 18:7

@Jess Austin: Your solution is totally wrong, because it works only when x consists of even number of hex-digits. Example:

x=0x963b685edb4a8f336f624c9cc2b02d9ea4536a2df669c19e9794854d1d93105c81c289cde061c0448176dfeda9c18370db55f1d0fe5229ce07de704d88cc7f765c1c8d69874e008da8bd5b3cdfaf137a3a48edc229a41e0fa792a7c61686f1fL

– Jugal 5/12, 2014 at 18:26

@lepi can you make an example? – Syndetic 21/12, 2014 at 17:23

@MarioAlemi bytearray.fromhex(hex(0x11000000).strip('0x').strip('L')) It won't just strip the '0x' character sequence from the beginning, it will remove all the '0' and all the 'x' characters from both side. When the number is not a Long and has tailing zeros, those will be also removed. – Jugal 22/12, 2014 at 21:46

@lepi thanks, I learned smt about strip() I did not know! Smt like that should work for the example, not sure there are other cases... bytearray.fromhex(hex(0x11000000).lstrip('0x').strip('L')) – Syndetic 23/12, 2014 at 14:9

This is a good solution for Python 2. It is important to have an even number of characters (like the 192 in the example), or a ValueError is raised. (It was mentioned before as a problem with the previous solution, but it is still something to watch out for). – Sondrasone 5/4, 2018 at 14:28

@Sondrasone the current solution still works no matter what the size of the argument to format(), so long as it fits in the space specified by the format expression string. That wasn't the case with the previous solution. fromhex() can't handle weird expressions so the format expression string should not be weird. One wouldn't expect that string would ever be dynamic, so that should be fine. – Bison 6/4, 2018 at 19:51

Right, I was just trying to point out that if you change the 192 in the format string to 191 (or to any odd number), you will get a ValueError. Just something that tripped me up. – Sondrasone 10/4, 2018 at 13:4

long/int to the byte array looks like exact purpose of struct.pack. For long integers that exceed 4(8) bytes, you can come up with something like the next:

>>> limit = 256*256*256*256 - 1
>>> i = 1234567890987654321
>>> parts = []
>>> while i:
        parts.append(i & limit)
        i >>= 32

>>> struct.pack('>' + 'L'*len(parts), *parts )
'\xb1l\x1c\xb1\x11"\x10\xf4'

>>> struct.unpack('>LL', '\xb1l\x1c\xb1\x11"\x10\xf4')
(2976652465L, 287445236)
>>> (287445236L << 32) + 2976652465L
1234567890987654321L

Saunderson answered 4/1, 2012 at 17:33 Comment(2)

But it won't help with big numbers (> 8 bytes), which will usually be used for cryptographic applications. – Hygrograph 4/1, 2012 at 17:48

it's written not to be generic but more like fixed size solution to common problem of representing all possible ip's or similar... – Idolatrize 8/9, 2016 at 21:38

You can try using struct:

import struct
struct.pack('L',longvalue)

Smalley answered 4/1, 2012 at 17:31 Comment(2)

Sadly no, error: integer out of range for 'L' format code. It's a 768 bit long, which is quite a bit bigger than the 4 byte unsigned int. – Wrinkly 4/1, 2012 at 18:14

Downvoted because Python long int are arbitrarily long integers. Think of it like an array of 32 (or whatever) bits integers. A C long is a size defined datatype. With this response, you are confusing both. – Scheel 15/1, 2016 at 22:4

Little-endian, reverse the result or the range if you want Big-endian:

def int_to_bytes(val, num_bytes):
    return [(val & (0xff << pos*8)) >> pos*8 for pos in range(num_bytes)]

Big-endian:

def int_to_bytes(val, num_bytes):
    return [(val & (0xff << pos*8)) >> pos*8 for pos in reversed(range(num_bytes))]

Tedman answered 30/1, 2013 at 20:32 Comment(0)

Basically what you need to do is convert the int/long into its base 256 representation -- i.e. a number whose "digits" range from 0-255. Here's a fairly efficient way to do something like that:

def base256_encode(n, minwidth=0): # int/long to byte array
    if n > 0:
        arr = []
        while n:
            n, rem = divmod(n, 256)
            arr.append(rem)
        b = bytearray(reversed(arr))
    elif n == 0:
        b = bytearray(b'\x00')
    else:
        raise ValueError

    if minwidth > 0 and len(b) < minwidth: # zero padding needed?
        b = (minwidth-len(b)) * '\x00' + b
    return b

You many not need thereversed()call depending on the endian-ness desired (doing so would require the padding to be done differently as well). Also note that as written it doesn't handle negative numbers.

You might also want to take a look at the similar but highly optimized long_to_bytes() function in thenumber.pymodule which is part of the open source Python Cryptography Toolkit. It actually converts the number into a string, not a byte array, but that's a minor issue.

Libel answered 4/1, 2012 at 21:32 Comment(0)

Python 2.7 does not implement the int.to- very slow_bytes() method.

I tried 3 methods:

hex unpack/pack : very slow
byte shifting 8 bits at a time: significantly faster.
using a "C" module and packing into the lower (7 ia64 or 3 i32) bytes. This was about twice as fast as 2/ . It is the fastest option, but still too slow.

All these methods are very inefficient for two reasons:

Python 2.7 does not support this useful operation.
c does not support extended precision arithmetic using the carry/borrow/overflow flags available on most platforms.

Socialminded answered 20/10, 2015 at 0:56 Comment(0)

i = 0x12345678
s = struct.pack('<I',i)
b = struct.unpack('BBBB',s)

Rattletrap answered 14/11, 2017 at 12:11 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags