Efficient way to swap bytes in python

Asked 19/3, 2016 at 0:12 Answered 5/10, 2020 at 14:15

I have some bytearray with length of 2*n:

a1 a2 b1 b2 c1 c2

I need to switch bytes endian in each 2-byte word, and make:

a2 a1 b2 b1 c2 c1

Now I use next approach but it is very slow for my task:

converted = bytearray([])
for i in range(int(len(chunk)/2)):
   converted += bytearray([ chunk[i*2+1], chunk[i*2] ])

Is it possible to switch endian of bytearray by calling some system/libc function?

Ok, thanks to all, I timed some suggestions:

import timeit

test = [
"""
converted = bytearray([])
for i in range(int(len(chunk)/2)):
   converted += bytearray([ chunk[i*2+1], chunk[i*2] ])
""",
"""
for i in range(0, len(chunk), 2):
    chunk[i], chunk[i+1] = chunk[i+1], chunk[i]
""",
"""
byteswapped = bytearray([0]) * len(chunk)
byteswapped[0::2] = chunk[1::2]
byteswapped[1::2] = chunk[0::2]
""",
"""
chunk[0::2], chunk[1::2] = chunk[1::2], chunk[0::2]
"""
]

for t in test:
    print(timeit.timeit(t, setup='chunk = bytearray([1]*10)'))

and result is:

$ python ti.py
11.6219761372
2.61883187294
3.47194099426
1.66421198845

So in-pace slice assignment with a step of 2 now is fastest. Also thanks to Mr. F for detailed explaining but I not yet tried it because of numpy

Occupant answered 19/3, 2016 at 0:12 Comment(0)

You could use slice assignment with a step of 2:

byteswapped = bytearray(len(original))
byteswapped[0::2] = original[1::2]
byteswapped[1::2] = original[0::2]

Or if you want to do it in-place:

original[0::2], original[1::2] = original[1::2], original[0::2]

Timing shows that slicing massively outperforms a Python-level loop for large arrays:

>>> timeit.timeit('''
... for i in range(0, len(chunk), 2):
...     chunk[i], chunk[i+1] = chunk[i+1], chunk[i]''',
... 'chunk=bytearray(1000)')
81.70195105159564
>>>
>>> timeit.timeit('''
... byteswapped = bytearray(len(original))
... byteswapped[0::2] = original[1::2]
... byteswapped[1::2] = original[0::2]''',
... 'original=bytearray(1000)')
2.1136113323948393
>>>
>>> timeit.timeit('chunk[0::2], chunk[1::2] = chunk[1::2], chunk[0::2]', 'chunk=
bytearray(1000)')
1.79349659994989

For small arrays, slicing still beats the explicit loop, but the difference isn't as big:

>>> timeit.timeit('''
... for i in range(0, len(chunk), 2):
...     chunk[i], chunk[i+1] = chunk[i+1], chunk[i]''',
... 'chunk=bytearray(10)')
1.2503637694328518
>>>
>>> timeit.timeit('''
... byteswapped = bytearray(len(original))
... byteswapped[0::2] = original[1::2]
... byteswapped[1::2] = original[0::2]''',
... 'original=bytearray(10)')
0.8973060929306484
>>>
>>> timeit.timeit('chunk[0::2], chunk[1::2] = chunk[1::2], chunk[0::2]', 'chunk=
bytearray(10)')
0.6282232971918802

Balcke answered 19/3, 2016 at 0:18 Comment(1)

@user3479125: I've corrected an inefficiency in how my code was creating the byteswapped array; it should now outperform the explicit loop if you redo your timings. I've also added an in-place slice-based solution that should be even faster. Also, if you try your timings with larger arrays, you'll find that an explicit loop scales much worse than slicing due to greater interpreter overhead. – Balcke 19/3, 2016 at 2:29

I can repeat the result of in-place slice method from @user2357112 supports Monica. But if data size is ten time as big, it is twice slower compared with array.byteswap():

>>> timeit.timeit('chunk[0::2], chunk[1::2] = chunk[1::2], chunk[0::2]', 'chunk=bytearray(10000)')
12.54664899999625

>>> timeit.timeit('a=array.array("H",chunk);a.byteswap();chunk=a.tobytes()', 'import array;chunk=bytearray(10000)')
6.398380300001008

Obstetric answered 19/3, 2016 at 0:12 Comment(2)

Is the impact of import array relevant? – Kerbing 12/3, 2021 at 17:7

@ack No. Because when we bother to optimize we probably are averaging the impact with many loops. When import array is moved into the loop, in the 1000 case the impact is only several percent. In fact I think chunk=a.tobytes() is not necessary to be there (saving 10%). – Obstetric 13/3, 2021 at 12:54

If you use numpy's frombuffer function, you can construct a numpy ndarray that actually shares the physical memory of the bytearray, and then swapping actions could be done in-place rather than in a copy.

You can simply perform the same indexing directly on byt, such as

byt[i] = byt[i+1]

but getting a handle to the buffered array with an explicit type in numpy often lets you do much more, particularly with bytearray which is very limited on its own.

Be careful though. Even though at the Python level, bytearray represents unsigned 8-bit integer values (0-255), the actual underlying C implementation, in bytearrayobject.h uses a plain char for the byte values (see here for more info). If using numpy for this, you probably want to specify the optional dtype argument to frombuffer, with dtype=numpy.ubyte.

import numpy as np
byt = bytearray(range(256))
npbyt = np.frombuffer(byt, dtype=np.ubyte)

for i in range(0, len(npbyt)-1, 2):
    temp = npbyt[i]
    npbyt[i] = npbyt[i+1]
    npbyt[i+1] = temp

print(list(byt))

It prints

[1, 
 0,
 3,
 2,
 5,
 4,
 ...
 255,
 254]

I ran into some of these issues while working on a small side project called buffersort, which can perform in-place sorting on Python objects that implement the write able buffer protocol, as bytearray does.

If you're interested in grabbing the Cython source from there, there is a simple helper function _swap that would make it easy to do what you want. It's probably highly overkill for your use case though.

Scutiform answered 19/3, 2016 at 0:31 Comment(0)

From some answers in How to byte-swap a 32-bit integer in python?

import struct

def htonl_slice(data):
    byteswapped = bytearray(4)
    byteswapped[0::4] = data[3::4]
    byteswapped[1::4] = data[2::4]
    byteswapped[2::4] = data[1::4]
    byteswapped[3::4] = data[0::4]
    return byteswapped

def htonl_slice_2(data):
    byteswapped = bytearray(len(data))
    byteswapped[0::4] = data[3::4]
    byteswapped[1::4] = data[2::4]
    byteswapped[2::4] = data[1::4]
    byteswapped[3::4] = data[0::4]
    return byteswapped

def htonl_struct(data):
    return struct.pack(">L", struct.unpack("<L", data)[0])

def swap32(data):
    return [struct.unpack("<I", struct.pack(">I", i))[0] for i in data]

def htonl_shift(x):
    return (((x << 24) & 0xFF000000) |
            ((x <<  8) & 0x00FF0000) |
            ((x >>  8) & 0x0000FF00) |
            ((x >> 24) & 0x000000FF))    

def test(self):

    data = [struct.pack('<L', i) for i in range(1000000)]

    start = time.time()
    for d in data:
        x = htonl_slice(d)
    end = time.time()
    print("htonl_slice %f" % (end - start))
    
    start = time.time()
    for d in data:
        x = htonl_struct(d)
    end = time.time()
    print("htonl_struct %f" % (end - start))

    data = [i for i in range(1000000)]
    start = time.time()
    for d in data:
        x = htonl_shift(d)
    end = time.time()
    print("htonl_shift %f" % (end - start))

    start = time.time()
    x = swap32(data)
    end = time.time()
    print("swap32 %f" % (end - start))

    data = bytearray()
    for i in range(1000000):
        data += struct.pack('<L', i)

    start = time.time()
    x = htonl_slice_2(data)
    end = time.time()
    print("htonl_slice_2 %f" % (end - start))

And the results are:

htonl_slice   3.041000
htonl_struct  0.626000
htonl_shift   0.864000
swap32        0.533000
htonl_slice_2 0.025000

I understand that htonl_shift works with ints instead of bytearrays.

It is interesting to see that swap32 is pretty fast, but for a 4 MB array htonl_slice_2 is by far the fastest.

Borzoi answered 5/10, 2020 at 14:15 Comment(0)

You could also swap them in place and use the original array.

chunk = bytearray([1,2,3,4])

for i in range(0, len(chunk), 2):
    chunk[i], chunk[i+1] = chunk[i+1], chunk[i]

Hussite answered 19/3, 2016 at 0:41 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags