@sarnold's answer is excellent, and here I want to share some code to illustrate it.
First here is a wrong way to do it: using a \n
delimiter. Don't do it! the binary data could contain \n
, and it would be mixed up with the delimiters:
import os, random
with open('test', 'wb') as f:
for i in range(100): # create 100 binary sequences of random
length = random.randint(2, 100) # length (between 2 and 100)
f.write(os.urandom(length) + b'\n') # separated with the character b"\n"
with open('test', 'rb') as f:
for i, l in enumerate(f):
print(i, l) # oops we get 123 sequences! wrong!
...
121 b"L\xb1\xa6\xf3\x05b\xc9\x1f\x17\x94'\n"
122 b'\xa4\xf6\x9f\xa5\xbc\x91\xbf\x15\xdc}\xca\x90\x8a\xb3\x8c\xe2\x07\x96<\xeft\n'
Now the right way to do it (option #4 in sarnold's answer):
import os, random
with open('test', 'wb') as f:
for i in range(100):
length = random.randint(2, 100)
f.write(length.to_bytes(2, byteorder='little')) # prepend the data with the length of the next data chunk, packed in 2 bytes
f.write(os.urandom(length))
with open('test', 'rb') as f:
i = 0
while True:
l = f.read(2) # read the length of the next chunk
if l == b'': # end of file
break
length = int.from_bytes(l, byteorder='little')
s = f.read(length)
print(i, s)
i += 1
...
98 b"\xfa6\x15CU\x99\xc4\x9f\xbe\x9b\xe6\x1e\x13\x88X\x9a\xb2\xe8\xb7(K'\xf9+X\xc4"
99 b'\xaf\xb4\x98\xe2*HInHp\xd3OxUv\xf7\xa7\x93Qf^\xe1C\x94J)'