Speed up python's struct.unpack
Asked Answered
T

4

16

I am trying to speed up my script. It basically reads a pcap file with Velodyne's Lidar HDL-32 information and allows me to get X, Y, Z, and Intensity values. I have profiled my script using python -m cProfile ./spTestPcapToLas.py and it is spending the most amount of time in my readDataPacket() function calls. In a small test (80 MB file) the unpacking portion takes around 56% of the execution time.

I call the readDataPacket function like this (chunk refers to the pcap file):

packets = []
for packet in chunk:
    memoryView = memoryview(packet.raw())
    udpDestinationPort = unpack('!h', memoryView[36:38].tobytes())[0]

    if udpDestinationPort == 2368:
        packets += readDataPacket(memoryView)

The readDataPacket() function itself is defined like this:

def readDataPacket(memoryView):
    firingData = memoryView[42:]    
    firingDataStartingByte = 0    
    laserBlock = []

    for i in xrange(firingBlocks):
        rotational = unpack('<H', firingData[firingDataStartingByte+2:firingDataStartingByte+4])[0]        
        startingByte = firingDataStartingByte+4
        laser = []
        for j in xrange(lasers):   
            distanceInformation = unpack('<H', firingData[startingByte:(startingByte + 2)])[0] * 0.002
            intensity = unpack('<B', firingData[(startingByte + 2)])[0]   
            laser.append([distanceInformation, intensity])
            startingByte += 3
        firingDataStartingByte += 100
        laserBlock.append([rotational, laser])

    return laserBlock

Any ideas on how I can speed up the process? By the way, I am using numpy for the X, Y, Z, Intensity calculations.

Twerp answered 22/4, 2016 at 14:54 Comment(0)
C
15

Numpy lets you do this very quickly. In this case I think the easiest way is to use the ndarray constructor directly:

import numpy as np

def with_numpy(buffer):
    # Construct ndarray with: shape, dtype, buffer, offset, strides.
    rotational = np.ndarray((firingBlocks,), '<H', buffer, 42+2, (100,))
    distance = np.ndarray((firingBlocks,lasers), '<H', buffer, 42+4, (100,3))
    intensity = np.ndarray((firingBlocks,lasers), '<B', buffer, 42+6, (100,3))
    return rotational, distance*0.002, intensity

This returns separate arrays instead of the nested list, which should be much easier to process further. As input it takes a buffer object (in Python 2) or anything that exposes the buffer interface. Unfortunately, it depends on your Python version (2/3) what objects you can use exactly. But this method is very fast:

import numpy as np

firingBlocks = 10**4
lasers = 32
packet_raw = np.random.bytes(42 + firingBlocks*100)

%timeit readDataPacket(memoryview(packet_raw))
# 1 loop, best of 3: 807 ms per loop
%timeit with_numpy(packet_raw)
# 100 loops, best of 3: 10.8 ms per loop
Coelenteron answered 23/4, 2016 at 10:15 Comment(2)
This resulted in ~30x increase in speed for that specific function. Thank you so much. :DTwerp
Went from 200 seconds to 3 seconds!Altruist
S
13

Compile a Struct ahead of time, to avoid the Python level wrapping code using the module level methods. Do it outside the loops, so the construction cost is not paid repeatedly.

unpack_ushort = struct.Struct('<H').unpack
unpack_ushort_byte = struct.Struct('<HB').unpack

The Struct methods themselves are implemented in C in CPython (and the module level methods are eventually delegating to the same work after parsing the format string), so building the Struct once and storing bound methods saves a non-trivial amount of work, particularly when unpacking a small number of values.

You can also save some work by unpacking multiple values together, rather than one at a time:

distanceInformation, intensity = unpack_ushort_byte(firingData[startingByte:startingByte + 3])
distanceInformation *= 0.002

As Dan notes, you could further improve this with iter_unpack, which would further reduce the amount of byte code execution and small slice operations.

Sulfite answered 22/4, 2016 at 15:20 Comment(1)
I'd suggest testing my iter_unpack method before being too certain it improves performance—it creates plenty of temporary objects, I think. Your method sounds more certain.Graner
A
3

For your specific situation if you can fit your loop into a numpy call, that'd be fastest.

With that said, for just the struct.unpack part -- if your data happens to native byte order, you can use memoryview.cast. For a short example, it is about 3x faster than naive struct.unpack without any change in logic.

In [20]: st = struct.Struct("<H")

In [21]: %timeit struct.unpack("<H", buf[20:22])
1.45 µs ± 26.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [22]: %timeit st.unpack(buf[20:22])
778 ns ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [23]: %timeit buf.cast("H")[0]
447 ns ± 4.16 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Agitprop answered 12/9, 2020 at 14:12 Comment(0)
G
1

You can unpack the raw distanceInformation and intensity values together in one call. Especially because you're just putting them into a list together: that's what unpack() does when it unpacks multiple values. In your case, you need to then multiple the distanceInformation by 0.002, but you might save time by leaving this until later, because you can use iter_unpack() to parse the whole list of raw pairs in one call. That function gives you a generator, which can be sliced with itertools.islice() and then turned into a list. Something like this:

laser_iter = struct.iter_unpack('<HB', firingData[firingDataStartingByte + 4])
laser = [[d * 0.002, i] for d, i in itertools.islice(laser_iter, lasers)]

Unfortunately this is a little harder to read, so you might want to find a way to spread this out into more lines of code, with more descriptive variable names, or add a comment for the future when you forget why you wrote this…

Graner answered 22/4, 2016 at 15:13 Comment(1)
Unfortunately I can't use Python 3. I am using Python 2.7.11. Do you know of another solution?Twerp

© 2022 - 2024 — McMap. All rights reserved.