I have a list of say 100k floats and I want to convert it into a bytes buffer.
buf = bytes()
for val in floatList:
buf += struct.pack('f', val)
return buf
This is quite slow. How can I make it faster using only standard Python 3.x libraries.
I have a list of say 100k floats and I want to convert it into a bytes buffer.
buf = bytes()
for val in floatList:
buf += struct.pack('f', val)
return buf
This is quite slow. How can I make it faster using only standard Python 3.x libraries.
Just tell struct
how many float
s you have. 100k floats takes about a 1/100th of a second on my slow laptop.
import random
import struct
floatlist = [random.random() for _ in range(10**5)]
buf = struct.pack('%sf' % len(floatlist), *floatlist)
array.array
but it is worth noting that array.tobytes()
returns the same as struct.pack(...)
. So one may use an array('f
, [...])` and have append
, indexer accessibility, etc. Array.array does not have all of the same methods as list
, but might be easier to implement in many cases. –
Drone *
' operator seems inefficient, it is needlessly convting floatlist into a tuple before passing it to struct.pack. Is there a way to pack from an iterable, instead of passing all values as args? Perhaps create the buffer first and then assign to a slice of it? –
Marginalia array.array
, unlike struct seems to be in native endian order, so it's not a replacement for struct. –
Charpentier update
"nowadays" one would use "numpy" for that - just create a numpy array using np.array(mylist)
- you can consume the content as bytes by using the built-in "memoryview" from Python:
In [46]: import numpy as np
In [47]: a = [1.0, 2.0, 3.0]
In [48]: b = np.array(a, dtype="double")
In [49]: bytes(memoryview(b))
Out[49]: b'\x00\x00\x00\x00\x00\x00\xf0?\x00\x00\x00\x00\x00\x00\x00@\x00\x00\x00\x00\x00\x00\x08@'
If you want to just write the bytes to a file, the array b
can be passed directly to a binary file open for writing, with no need to call memoryview
(which is 0 copy) or bytes
on top of that: open("myfile.bin", "wb").write(np.array(my_float_list, dtype="double"))
Numpy will allow you to output the data in several formats, change byte order, and such. With no numpy, the standard library array
module will also work just fine, as described above:
In [51]: import array
In [52]: a = [1.0, 2.0, 3.0]
In [53]: bytes(memoryview(array.array("d", a)))
Out[53]: b'\x00\x00\x00\x00\x00\x00\xf0?\x00\x00\x00\x00\x00\x00\x00@\x00\x00\x00\x00\x00\x00\x08@'
original answer using ctypes
You can use ctypes, and have a double-array (or float array) exactly as you'd have in C , instead of keeping your data in a list. This is fair low level, but is a recommendation if you need great performance and if your list is of a fixed size.
You can create the equivalent of a C
double array[100];
in Python by doing:
array = (ctypes.c_double * 100)()
The ctypes.c_double * 100
expression yields a Python class for an array of doubles, 100 items long. To wire it to a file, you can just use buffer
to get its contents:
>>> f = open("bla.dat", "wb")
>>> f.write(buffer(array))
If your data is already in a Python list, packing it into a double array may or may not be faster than calling struct
as in Agf's accepted answer - I will leave measuring which is faster as homework, but all the code you need is this:
>>> import ctypes
>>> array = (ctypes.c_double * len(floatlist))(*floatlist)
To see it as a string, just do: str(buffer(array))
- the one drawback here is that you have to take care of float size (float vs double) and CPU dependent float type - the struct module can take care of this for you.
The big win is that with a float array you can still use the elements as numbers, by accessing then just as if it where a plain Python list, while having then readily available as a planar memory region with buffer
.
*floatlist
' needlessly converts floatlist to a new tuple before passing it to struct.pack. See my answer. –
Marginalia buffer()
come from? –
Asylum buffer
was a builtin function in Python 2 - some of its functionality, but not all, is replaced by memoryview
(also a built-in) in Python 3: docs.python.org/3/library/stdtypes.html#typememoryview –
Bolection A couple of answers suggest
import struct
buf = struct.pack(f'{len(floatlist)}f', *floatlist)
but the use of '*
' needlessly converts floatlist
to a tuple before passing it to struct.pack
. It's faster to avoid that, by first creating an empty buffer, and then populating it using slice assignment:
import ctypes
buf = (ctypes.c_double * len(floatlist))()
buf[:] = floatlist
Other performance savings some people might be able to use:
As with strings, using .join()
will be faster than continually concatenating. Eg:
import struct
b = bytes()
floatList = [5.4, 3.5, 7.3, 6.8, 4.6]
b = b.join((struct.pack('f', val) for val in floatList))
Results in:
b'\xcd\xcc\xac@\x00\x00`@\x9a\x99\xe9@\x9a\x99\xd9@33\x93@'
That should work:
return struct.pack('f' * len(floatList), *floatList)
*
' operator needlessly converts floatlist to a new tuple before passing it to struct.pack. This will be slow for large arrays. See my answer below. –
Marginalia For array of single precision float there are two options: to use struct
or array
.
In[103]: import random
import struct
from array import array
floatlist = [random.random() for _ in range(10**5)]
In[104]: %timeit struct.pack('%sf' % len(floatlist), *floatlist)
100 loops, best of 3: 2.86 ms per loop
In[105]: %timeit array('f', floatlist).tostring()
100 loops, best of 3: 4.11 ms per loop
So struct
is faster.
*floatlist
' needlessly converts floatlist to a new tuple before passing it to struct.pack. See my answer. –
Marginalia You can use the pack_into
, unpack_from
, and iter_unpack
methods to write lists:
# your input
size = 5
values = list(range(size))
fmt = "f"
# serialize
bytes = struct.calcsize(fmt)
buffer = bytearray(size*bytes)
for i,value in enumerate(values):
struct.pack_into(fmt, buffer, i*bytes, *value)
# deserialize
restored_values = list(struct.iter_unpack(fmt, buffer))
print("original:", values)
print("restored:", restored_values)
As you say that you really do want single-precision 'f' floats, you might like to try the array module (in the the standard library since 1.x).
>>> mylist = []
>>> import array
>>> myarray = array.array('f')
>>> for guff in [123.45, -987.654, 1.23e-20]:
... mylist.append(guff)
... myarray.append(guff)
...
>>> mylist
[123.45, -987.654, 1.23e-20]
>>> myarray
array('f', [123.44999694824219, -987.6539916992188, 1.2299999609665927e-20])
>>> import struct
>>> mylistb = struct.pack(str(len(mylist)) + 'f', *mylist)
>>> myarrayb = myarray.tobytes()
>>> myarrayb == mylistb
True
>>> myarrayb
b'f\xe6\xf6B\xdb\xe9v\xc4&Wh\x1e'
This can save you a bag-load of memory, while still having a variable-length container with most of the list methods. The array.array approach takes 4 bytes per single-precision float. The list approach consumes a pointer to a Python float object (4 or 8 bytes) plus the size of that object; on a 32-bit CPython implementation, that is 16:
>>> import sys
>>> sys.getsizeof(123.456)
16
Total: 20 bytes per item best case for a list
, 4 bytes per item always for an array.array('f')
.
*mylist
' needless converts mylist to a new tuple before passing it to struct.pack. See my answer. –
Marginalia Most of the slowness will be that you're repeatedly appending to a bytestring. That copies the bytestring each time. Instead, you should use b''.join()
:
import struct
packed = [struct.pack('f', val) for val in floatList]
return b''.join(packed)
struct
once for 100k float
s. –
Christmastide In my opinion the best way is to create a cycle:
e.g.
import struct
file_i="test.txt"
fd_out= open ("test_bin_file",'wb')
b = bytes()
f_i = open(file_i, 'r')
for riga in file(file_i):
line = riga
print i,float(line)
i+=1
b=struct.pack('f',float(line))
fd_out.write(b)
fd_out.flush()
fd_out.close()
To append to an existing file use instead:
fd_out= open ("test_bin_file",'ab')
© 2022 - 2024 — McMap. All rights reserved.
'f'
gets you a C float (32 bits); you no doubt want a Python float aka C double (64 bits) so you and your followers should be using'd'
– Heteronomy