How to define a binary string in Python in a way that works with both py2 and py3?
Asked Answered
W

3

7

I am writing a module that is supposed to work in both Python 2 and 3 and I need to define a binary string.

Usually this would be something like data = b'abc' but this code code fails on Python 2.5 with invalid syntax.

How can I write the above code in a way that will work in all versions of Python 2.5+

Note: this has to be binary (it can contain any kind of characters, 0xFF), this is very important.

Wraf answered 13/10, 2011 at 13:45 Comment(4)
Binary string? Do you mean a bytes object?Vandiver
The b"abc" syntax and the bytes() constructor were added in Python 2.6.Vandiver
Yes, I was referring to bytes.Wraf
When googling for python 2 and python 3 in various ways of googling for this, both the six library, and my book, which has essentially similar working solutions for this, will appear on the first page of the search results. Yet, nobody seems to know either of them exists. How can we fix that? Spread the word!Averill
A
6

I would recommend the following:

from six import b

That requires the six module, of course. If you don't want that, here's another version:

import sys
if sys.version < '3':
    def b(x):
        return x
else:
    import codecs
    def b(x):
        return codecs.latin_1_encode(x)[0]

More info.

These solutions (essentially the same) work, are clean, as fast as you are going to get, and can support all 256 byte values (which none of the other solutions here can).

Averill answered 13/10, 2011 at 20:16 Comment(0)
Y
2

If the string only has ASCII characters, call encode. This will give you a str in Python 2 (just like b'abc'), and a bytes in Python 3:

'abc'.encode('ascii')

If not, rather than putting binary data in the source, create a data file, open it with 'rb' and read from it.

Yeanling answered 13/10, 2011 at 13:57 Comment(7)
As you suspected I do have several very small binary blocks, so using files for storing them is not an option. And yes they have non-ascii values.Wraf
So, what do the strings actually look like? If they're human-readable strings, decode them with the proper encoding. If not, then use base64.Spencer
Create a file and read from it? Complicated solution for a simple problem. Sorry, -1.Averill
(And using ascii is limiting without reason, use latin1 instead).Averill
@LennartRegebro: That wouldn't work in Python 2; try '\xff'.encode('latin1').Spencer
You are trying to encode a string. That's backwards. It's u'\xff'.encode('latin1') in Python 2. Of course, that's pointless as you can just do '\xff', which is the whole point of the b() functions in my answer. (And it's not like '\xff'.encode('ascii') works...)Averill
Huh? My answer does say that it works for ASCII only. Yes, I know it's not the right answer now that the OP edited the question, but it might be useful for others. I was replying to your comment; using latin1 instead of ascii wouldn't accomplish anything.Spencer
D
-3

You could store the data base64-encoded.

First step would be to transform into base64:

>>> import base64
>>> base64.b64encode(b"\x80\xFF")
b'gP8='

This is to be done once, and using the b or not depends on the version of Python you use for it.

In the second step, you put this byte string into a program without the b. Then it is ensured that it works in py2 and py3.

import base64
x = 'gP8='
base64.b64decode(x.encode("latin1"))

gives you a str '\x80\xff' in 2.6 (should work in 2.5 as well) and a b'\x80\xff'in 3.x.

Alternatively to the two steps above, you can do the same with hex data, you can do

import binascii
x = '80FF'
binascii.unhexlify(x) # `bytes()` in 3.x, `str()` in 2.x
Dextrorotation answered 13/10, 2011 at 14:11 Comment(8)
Oops, the code is going to be quite cryptic. Cant we find a solution that will work with hex.Wraf
Have you tried the code in Python3 ? binascii.unhexlify(x) gives TypeError: 'str' does not support the buffer interfaceWraf
I don't understand what the base64 part is supposed to do. You can remove it and it will still work.Averill
@sorin: strange... here it works fine in Python 3.1 (r31:73572, Jul 5 2010, 13:15:03). Maybe x.encode("latin1") works better here as well...Dextrorotation
@Lennart Regebro It is supposed to be an alternative, as hex was preferred. b'\x80\xff' gets encoded to 'gP8=' in base64 and to '80FF' in hex.Dextrorotation
Yeah, and the base64 is still pointless, '\x80\xff' is hex... Your first example uses the code b"\x80\xFF". You can remove everything else, that's all you need, assuming that you are using Python 2.6 and later, which you must be for that to work.Averill
'\x80\xff' is a representation of 2 raw data bytes in hex, as opposed to '80FF' which are 4 bytes which represent the same data in hex. Of course you can put '\x80\xFF'directly in the code - it might be ok or not, depending where the data come from.Dextrorotation
And, I cannot use \x80\xff' resp. b\x80\xff` if I want to stay portable - I would have to transform depending on the Python version. Here, this is not the case.Dextrorotation

© 2022 - 2024 — McMap. All rights reserved.